GitHub user ernisv opened a pull request:
https://github.com/apache/storm/pull/1888
STORM-2296 Kafka spout no dup on leader changes 1 x
Current behavior of Kafka spout emits duplicate tuples whenever Kafka topic
leader's change.
In case of exception caused by leader changes, PartitionManagers are simply
recreated losing the state about which tuples were already emitted and new
PartitionManager re-emits them again.
This is fine as at-least-once is fulfilled, but still it would be better to
not emit duplicate data if possible.
Moreover this could be easily avoided by moving the state related to
emitted tuples from old PartitionManager to new one.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ernisv/storm
kafka_spout_no_dup_on_leader_changes_1_x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1888.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1888
----
commit fb9c3073f5babc35828abdcf897db31846cabecc
Author: Ernestas Vaiciukevicius <[email protected]>
Date: 2017-01-12T14:54:59Z
Move state from old PartitionManager when recreating manager for same
partition
commit aefd80a5404b726d1dd538018b4f7f0bca119627
Author: Ernestas Vaiciukevicius <[email protected]>
Date: 2017-01-12T15:39:51Z
Test to check if old PartitionManager's state is moved to new manager
during manager recreate
commit c744e4b7dcbca5082243a691d97f12cd4b1151c3
Author: Ernestas Vaiciukevicius <[email protected]>
Date: 2017-01-12T15:57:46Z
Include _emittedToOffset when copying state during PartitionManager recreate
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---