Derek Dagit created STORM-746:
---------------------------------
Summary: Disable Spout Ack Init when there is no output task
Key: STORM-746
URL: https://issues.apache.org/jira/browse/STORM-746
Project: Apache Storm
Issue Type: Improvement
Affects Versions: 0.9.2-incubating
Reporter: Derek Dagit
Assignee: Derek Dagit
Priority: Minor
Suppose a user cannot easily modify the spout in the topology.
The user has temporarily disabled transferring of tuples from a spout, for
debugging.
In this case, when acking is used, each time the spout emits, it sends a tuple
to the acker bolt. The bolt executes on this tuple by initializing the
bit-field used for tracking when the tuple "tree" has completed processing
(XOR-ing the new field with 0), then checking whether processing is complete
(by comparing the field to 0), and finally sending an ack in reply to the spout.
Normally, this is not a problem beyond the overhead, but on at least one
occasion in the course of debugging topology performance, the acker bolt's host
was so overloaded that it actually could not send the reply ack back to the
spout before the spout timed it out. This resulted in a lot of Fails reported
for tuples that were not supposed to go anywhere in the first place, and an
unnecessary count against the max.spout.pending that evidently also makes it
harder to debug.
This was very confusing to the user.
I propose that we short-cut the ack init in the case when the spout does not
emit to any downstream tasks.
I do have some misgivings already about making this change, as a spout emitting
nowhere could be considered outside the set of normal use cases for Storm.
That said, I will not be unhappy if someone gives a -1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)