I'm curious - has anyone built any Samza-based systems that use any notion of stream progress, e.g. low watermarks, punctuations, or heartbeats? These are described in the stream-processing literature [1] [2] [3] and implemented in MillWheel [4] and Dataflow [5] but I have not seen any mention of these techniques related to Samza (except for briefly in Samza-552 [6]).
The purpose of something like a low watermark would include handling out-of-order events, outputting the result of a stateful operation after all relevant events have been processed, and cleaning up internal state that will never again be updated to avoid unbounded growth. Just wondering if techniques like these would be useful in Samza job pipelines, or if there are various approaches in Samza that make them unnecessary. Thanks, Zach [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390 [2] http://dl.acm.org/citation.cfm?id=1055596 [3] http://dl.acm.org/citation.cfm?id=1453890 [4] http://research.google.com/pubs/pub41378.html [5] http://research.google.com/pubs/pub43864.html [6] https://issues.apache.org/jira/browse/SAMZA-552