Thanks for your quick replies, Bobby and Arun!

On 8 Feb 2016, at 18:03, Arun Mahadevan <[email protected]> wrote:
> The execute phase is pipelined and only the commits are strictly ordered. 
> 
> So a trident bolt could receive tuples from batch1, batch2 and again batch1 
> and so on. The framework internally maintains separate context for each batch 
> and the execute is invoked with the respective batch’s context. The bolts 
> could also emit tuples which are forwarded to the next bolt in the DAG 
> without waiting for the batch to complete.

Just to make sure I get this right: The intermixing of tuples from different 
batches only happens when pipelining is enabled, doesn’t it?

So, could the properties summarized as follows?
Without pipelining: Tuples are assigned to a batch and emitted as soon as 
possible. When all tuples of a batch have completed processing, a commit is 
issued and afterwards, tuples of the next batch will begin processing.
WIth pipeling: Tuples assigned to multiple different batches (at most 
`topology.max.spout.pending` batches) may be active at a time. When all tuples 
of a batch have completed processing, results from that batch are committed. As 
long as the commit isn’t finished, no second commit will be started.

Regards,
Felix

Reply via email to