Hi, Kishore, First I want some clarification on your use case. 1) Scenario 1: you still want the Samza jobs continuously running, while simply want to detect the end of a certain stream. On detection, do you need to unsubscribe from the stream? Or you are still OK receiving more messages from the stream? 2) Scenario 2: you want the Samza jobs to shutdown when detecting the end of a certain stream.
Which scenario are you targeting? Thanks! -Yi On Wed, Oct 14, 2015 at 9:33 AM, Kishore N C <kishor...@gmail.com> wrote: > Hi, > > Our data processing pipeline consists of a set of Samza jobs, that form a > DAG. Sometimes, we have to throw finite datasets into the Kafka topic that > acts as the entry point to the pipeline. Given that different Samza jobs in > the DAG could have varying latencies in terms of processing the records (or > could even temporarily fails or be stuck), how do I detect that my assembly > of jobs have finished processing all records? It's not as simple as > tallying the input and output record counts, as some jobs could be > filtering data, and others could be grouping records etc. > > Thanks, > > Kishore. >