[ 
https://issues.apache.org/jira/browse/STORM-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045388#comment-16045388
 ] 

Prasanna Ranganathan commented on STORM-2343:
---------------------------------------------

{quote}
Instead of pausing nonretriable partitions, we could instead keep track of 
numUncommittedOffsets per partition, so we can pause only those partitions that 
have no retriable tuples and are at the maxUncommittedOffsets limit. That way 
unhealthy partitions can't block healthy partitions, and we avoid the case 
described above where a failed tuple on one partition causes new (limit 
breaking) tuples to be emitted on a different partition.
{quote}

Yes, this is what we should do. Pause partitions only in the above scenario. In 
the special case of a spout handling only one partition, we can simply skip 
poll() instead of pausing even when this condition is met.

Noted your update on kafka consumer pause being locally managed. Makes sense.

STORM-2542 is interesting. Will comment on that in that JIRA once I catch up on 
it.

> New Kafka spout can stop emitting tuples if more than maxUncommittedOffsets 
> tuples fail at once
> -----------------------------------------------------------------------------------------------
>
>                 Key: STORM-2343
>                 URL: https://issues.apache.org/jira/browse/STORM-2343
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.1.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Critical
>             Fix For: 2.0.0, 1.1.1
>
>          Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> It doesn't look like the spout is respecting maxUncommittedOffsets in all 
> cases. If the underlying consumer returns more records in a call to poll() 
> than maxUncommittedOffsets, they will all be added to waitingToEmit. Since 
> poll may return up to 500 records by default (Kafka 0.10.1.1), this is pretty 
> likely to happen with low maxUncommittedOffsets.
> The spout only checks for tuples to retry if it decides to poll, and it only 
> decides to poll if numUncommittedOffsets < maxUncommittedOffsets. Since 
> maxUncommittedOffsets isn't being respected when retrieving or emitting 
> records, numUncommittedOffsets can be much larger than maxUncommittedOffsets. 
> If more than maxUncommittedOffsets messages fail, this can cause the spout to 
> stop polling entirely.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to