Hi Anton, Samza doesn’t have the same concept of an ack as Storm does built in. This could be seen as a good or bad thing. On one hand, the ack is very expensive in storm, on the other hand you can very easily do what you are describing. Samza topologies aren't DAGs, you can have jobs that feed back into themselves.
In Samza, the checkpoint in kafka for each system consumer is probably the closest equivalent. You can use the checkpointing to ensure that you have processed every message. In the case of fanning out to sub-tasks, I would suggest aggregating the results of each task downstream with another job that can signal when each sub-task has completed. Perhaps someone else has a better solution. HTH, Rick > On Dec 2, 2015, at 1:41 AM, Anton Polyakov <polyakov.an...@gmail.com> wrote: > > Hi Rick > > But processing pipeline is more than 1 job working in parallel, not > serially. This is why single "done" flag is not enough. Imagine a situation > when 1 trade creates 3 sub-events and they are processed concurreently by 3 > instances of the job. Then I need at least to wait for 3 flags. And if > pipeline is more complex, it can be more flags all of which I have to know > in advance - so I will stick to some known topology and if it changes, > "finishing" condition also changes > > On Wednesday, December 2, 2015, Rick Mangi <r...@chartbeat.com> wrote: > >> The simplest way would be to have your job send a message to another kafka >> topic when it’s done. >> >> Rick >> >> >> >>> On Dec 1, 2015, at 3:44 PM, Anton Polyakov <polyakov.an...@gmail.com >> <javascript:;>> wrote: >>> >>> Hi >>> >>> I am looking at Samza to process some incoming stream of trades. >> Processing pipeline is a complex DAG where some nodes might create zero to >> many descendant events. Ultimately they got to the end sink (and now these >> are completely different events all originated by one source trade) >>> >>> I am looking for a answer to quite fundamental (in my view) question - >> given source trade, how could I know if it was fully processed by DAG or is >> it still in flight? It is called lineage tracking in Storm afaik. >>> >>> In my situation when trade event is fired I need a reliable way to >> determine when processing is done in order to treat results. I googled a >> lot of algorithms - checkpointing in Flink, transactions in Storm. >> Checkpointing in Flink could work but unfortunately I can’t use them from >> my source to mark a trade with checkpoint barrier - API is not available. >> Looking at Samza there’s nothing obvious either. >>> >>> I have a very uncomfortable feeling that the problem should be very >> basic and fundamental, maybe I am using wrong terms to explain it. take it >> to extreme - one can send 1 event to Samza and processing is takes 15 >> minutes on some node. How on the end of the pipeline should I know when its >> done? >>> >>> Thank you >> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail