Hi I am looking at Samza to process some incoming stream of trades. Processing pipeline is a complex DAG where some nodes might create zero to many descendant events. Ultimately they got to the end sink (and now these are completely different events all originated by one source trade)
I am looking for a answer to quite fundamental (in my view) question - given source trade, how could I know if it was fully processed by DAG or is it still in flight? It is called lineage tracking in Storm afaik. In my situation when trade event is fired I need a reliable way to determine when processing is done in order to treat results. I googled a lot of algorithms - checkpointing in Flink, transactions in Storm. Checkpointing in Flink could work but unfortunately I can’t use them from my source to mark a trade with checkpoint barrier - API is not available. Looking at Samza there’s nothing obvious either. I have a very uncomfortable feeling that the problem should be very basic and fundamental, maybe I am using wrong terms to explain it. take it to extreme - one can send 1 event to Samza and processing is takes 15 minutes on some node. How on the end of the pipeline should I know when its done? Thank you