I'm curious - has anyone built any Samza-based systems that use any notion
of stream progress, e.g. low watermarks, punctuations, or heartbeats? These
are described in the stream-processing literature [1] [2] [3] and
implemented in MillWheel [4] and Dataflow [5] but I have not seen any
mention of these techniques related to Samza (except for briefly in
Samza-552 [6]).

The purpose of something like a low watermark would include handling
out-of-order events, outputting the result of a stateful operation after
all relevant events have been processed, and cleaning up internal state
that will never again be updated to avoid unbounded growth.

Just wondering if techniques like these would be useful in Samza job
pipelines, or if there are various approaches in Samza that make them
unnecessary.

Thanks,
Zach

[1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390
[2] http://dl.acm.org/citation.cfm?id=1055596
[3] http://dl.acm.org/citation.cfm?id=1453890
[4] http://research.google.com/pubs/pub41378.html
[5] http://research.google.com/pubs/pub43864.html
[6] https://issues.apache.org/jira/browse/SAMZA-552

Reply via email to