Re: Trident pipelining and transactional properties

Bobby Evans Mon, 08 Feb 2016 09:11:25 -0800

Felix,
Trident tracks batches using the ackers and coordinators, so as soon as a 
trident spout outputs a tuple, it will start flowing through the topology.  
When that batch completes fully the coordinator sends out another set of 
messages to commit the results from that batch.  When that completes, the next 
batch is output.
The guarantee is that the batches will be committed in order, possibly with a 
repeat.  I have not dug into the details of how all of this works with the 
state guarantees though.
 - Bobby


    On Monday, February 8, 2016 10:20 AM, Felix Dreissig <[email protected]> wrote:
 

 Hi,

I’m re-posting this from Storm-User, as it didn’t get a reply there and touches 
the internal implementation quite a bit.

I am trying to understand the parallelism properties and transactional 
semantics offered by Trident and couldn’t find an answer to these two questions:

1. The „Trident Spouts“ documentation [1] says that „[b]y default, Trident 
processes a single batch at a time, waiting for the batch to succeed or fail 
before trying another batch“.
But do Trident bolts always wait until a batch is completed, collect the 
results and then pass them on to the next bolt(s) as complete batches? Without 
pipelining, this would mean that only one bolt can be active at a time, 
effectively preventing any parallelism.
Or are tuples entering a stream and being delivered to the next bolt(s) as soon 
as they are emitted? This would still introduce some idle time and increased 
latency without pipelining, but at least seems like a better resource 
utilization.

2. The idea that states will only ever have to deal with a new batch or one 
from immediately before (assuming transactional or opaque-transactional spouts) 
is at the core of Trident’s state model.
On this topic, the docs section from above promises that even with pipelining, 
„Trident will order any state updates taking place in the topology among 
batches“. Is this some special guarantee for the built-in stateful operations 
(i.e. partitionPersist and persistentAggregate, which afaics uses 
partitionPersist internally), or can all bolts assume that they’ll never see 
any batch repeated except the latest one they processed?

I couldn’t find these questions covered in the docs or in previous discussions. 
So I tried consulting the source code, but it’s not easily comprehensible with 
regard to such issues.
Any help would be highly appreciated.

Best regards,
Felix

[1] https://storm.apache.org/documentation/Trident-spouts.html#pipelining

Re: Trident pipelining and transactional properties

Reply via email to