Roshan Naik created STORM-3314:
----------------------------------

             Summary: Acker redesign
                 Key: STORM-3314
                 URL: https://issues.apache.org/jira/browse/STORM-3314
             Project: Apache Storm
          Issue Type: Improvement
          Components: storm-client
            Reporter: Roshan Naik


*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359


*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.


*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.


This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.


*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.


*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.


*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to