GitHub user srdo opened a pull request:
https://github.com/apache/storm/pull/1924
STORM-2343: New Kafka spout can stop emitting tuples if more than
maxUncommittedOffsets tuples fail at once
This builds on https://github.com/apache/storm/pull/1832, sorry for the
mixed diff
This makes the spout emit failed tuples even when numUncommittedOffsets has
reached maxUncommittedOffsets. Previously the spout would fail to progress in
that case. I haven't really been able to enforce maxUncommittedOffsets strictly
without having to request extra messages from Kafka and throwing them away. The
consumer doesn't allow the spout to limit how many tuples it requests
dynamically, and I'd prefer that the spout doesn't truncate the list of records
it receives.
Instead, maxUncommittedOffsets is now a soft cap on the number of tuples
that can be emitted before some must be committed. In some cases (e.g. 10
partitions with 1 failed message emitted on each), the spout will exceed the
limit. It should never be by more than 1 maxPollRecords size though.
This also makes KafkaSpoutRetryExponentialBackoff use Storm Time, and it
fixes a bug where messages could be lost if they were scheduled for retry at
the same timestamp. It also fixes double counting failed tuples in
numUncommittedOffsets when retrying, since the counter isn't decreased when the
tuple is scheduled for retry.
maxPollRecords is now capped by maxUncommittedOffsets, since if
maxPollRecords is higher the spout will exceed the limit on any poll where
Kafka can return maxPollRecords messages.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-2343
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1924.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1924
----
commit d7432f8e39c2dd91902bb281be6381d8ce4d53fe
Author: Stig Rohde Døssing <[email protected]>
Date: 2017-02-04T08:04:19Z
STORM-2250: Kafka Spout Refactoring to Increase Modularity and Testability
commit 1c3f9ab53995236cb1a92ba8a51cdfcf73a21bfc
Author: Stig Rohde Døssing <[email protected]>
Date: 2017-02-05T18:46:49Z
STORM-2343: Fix new kafka spout stopping processing if more than
maxUncommittedOffsets tuples fail at once
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---