GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/1924

    STORM-2343: New Kafka spout can stop emitting tuples if more than 
maxUncommittedOffsets tuples fail at once

    This builds on https://github.com/apache/storm/pull/1832, sorry for the 
mixed diff
    
    This makes the spout emit failed tuples even when numUncommittedOffsets has 
reached maxUncommittedOffsets. Previously the spout would fail to progress in 
that case. I haven't really been able to enforce maxUncommittedOffsets strictly 
without having to request extra messages from Kafka and throwing them away. The 
consumer doesn't allow the spout to limit how many tuples it requests 
dynamically, and I'd prefer that the spout doesn't truncate the list of records 
it receives.
    
    Instead, maxUncommittedOffsets is now a soft cap on the number of tuples 
that can be emitted before some must be committed. In some cases (e.g. 10 
partitions with 1 failed message emitted on each), the spout will exceed the 
limit. It should never be by more than 1 maxPollRecords size though.
    
    This also makes KafkaSpoutRetryExponentialBackoff use Storm Time, and it 
fixes a bug where messages could be lost if they were scheduled for retry at 
the same timestamp. It also fixes double counting failed tuples in 
numUncommittedOffsets when retrying, since the counter isn't decreased when the 
tuple is scheduled for retry.
    
    maxPollRecords is now capped by maxUncommittedOffsets, since if 
maxPollRecords is higher the spout will exceed the limit on any poll where 
Kafka can return maxPollRecords messages.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-2343

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1924
    
----
commit d7432f8e39c2dd91902bb281be6381d8ce4d53fe
Author: Stig Rohde Døssing <[email protected]>
Date:   2017-02-04T08:04:19Z

    STORM-2250: Kafka Spout Refactoring to Increase Modularity and Testability

commit 1c3f9ab53995236cb1a92ba8a51cdfcf73a21bfc
Author: Stig Rohde Døssing <[email protected]>
Date:   2017-02-05T18:46:49Z

    STORM-2343: Fix new kafka spout stopping processing if more than 
maxUncommittedOffsets tuples fail at once

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to