The execute phase is pipelined and only the commits are strictly ordered. 

So a trident bolt could receive tuples from batch1, batch2 and again batch1 and 
so on. The framework internally maintains separate context for each batch and 
the execute is invoked with the respective batch’s context. The bolts could 
also emit tuples which are forwarded to the next bolt in the DAG without 
waiting for the batch to complete.

However the finsihBatch/commit is ordered. I.e commit for batch2 is invoked 
only after batch1 commit is successful.


Thanks,
Arun

On 2/8/16, 9:50 PM, "Felix Dreissig" <[email protected]> wrote:

>Hi,
>
>I’m re-posting this from Storm-User, as it didn’t get a reply there and 
>touches the internal implementation quite a bit.
>
>I am trying to understand the parallelism properties and transactional 
>semantics offered by Trident and couldn’t find an answer to these two 
>questions:
>
>1. The „Trident Spouts“ documentation [1] says that „[b]y default, Trident 
>processes a single batch at a time, waiting for the batch to succeed or fail 
>before trying another batch“.
>But do Trident bolts always wait until a batch is completed, collect the 
>results and then pass them on to the next bolt(s) as complete batches? Without 
>pipelining, this would mean that only one bolt can be active at a time, 
>effectively preventing any parallelism.
>Or are tuples entering a stream and being delivered to the next bolt(s) as 
>soon as they are emitted? This would still introduce some idle time and 
>increased latency without pipelining, but at least seems like a better 
>resource utilization.
>
>2. The idea that states will only ever have to deal with a new batch or one 
>from immediately before (assuming transactional or opaque-transactional 
>spouts) is at the core of Trident’s state model.
>On this topic, the docs section from above promises that even with pipelining, 
>„Trident will order any state updates taking place in the topology among 
>batches“. Is this some special guarantee for the built-in stateful operations 
>(i.e. partitionPersist and persistentAggregate, which afaics uses 
>partitionPersist internally), or can all bolts assume that they’ll never see 
>any batch repeated except the latest one they processed?
>
>I couldn’t find these questions covered in the docs or in previous 
>discussions. So I tried consulting the source code, but it’s not easily 
>comprehensible with regard to such issues.
>Any help would be highly appreciated.
>
>Best regards,
>Felix
>
>[1] https://storm.apache.org/documentation/Trident-spouts.html#pipelining

Reply via email to