GitHub user srdo opened a pull request:
https://github.com/apache/storm/pull/2249
WIP: STORM-2648/STORM-2357: Add storm-kafka-client support for
at-most-oncâ¦
â¦e processing and a toggle for whether messages should be emitted with a
message id when not using at-least-once
See https://issues.apache.org/jira/browse/STORM-2357 and
https://issues.apache.org/jira/browse/STORM-2648.
I'd like to get some opinions on whether this approach is a good idea, or
whether I've overlooked a better option, before finishing this up with some
tests. I don't love that we'll end up with 3 different committing behaviors.
In 2357 it was noted that the spout doesn't currently support true
at-most-once, because using Kafka's auto commit option leaves the possibility
that the spout receives a tuple, emits it to the topology, crashes and
recovers, and then receives and emits the same tuple. The linked issue suggests
solving this by committing polled offsets before emitting them to the topology,
which is an option added here.
2648 notes that there is currently no way to make Storm track messages when
using auto commit with this spout. This prevents Storm UI from showing the
complete latency for the spout, and I would assume also prevents max spout
pending from having an effect. I've added a toggle to KafkaSpoutConfig to force
the spout to emit messages with message ids, even when using auto commit or the
at-most-once option. The spout does nothing on ack or fail when not doing
at-least-once.
I'd like to keep the spout config simple for the user, so I've added a
processing guarantee setting corresponding to the standard at-least-once code
path, the path that uses auto commit, and the path that commits offsets before
emitting any tuples.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-2648
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2249.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2249
----
commit 4fc4b71f9720f506be20740f780dfef93f2dd036
Author: Stig Rohde Døssing <[email protected]>
Date: 2017-07-31T18:26:55Z
STORM-2648/STORM-2357: Add storm-kafka-client support for at-most-once
processing and a toggle for whether messages should be emitted with a message
id when not using at-least-once
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---