Hi, Zach, Glad that you pointed it out! Actually, the design description in SAMZA-552 has adopt a lot of flavors of high-watermark, late-arrivals, from MillWheel. The terms used in the design doc maybe different since the terms in the doc were used earlier than we discovered the MillWheel presentation. But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing technique described there) is targeted to implement those concepts of high-watermark/late-arrivals in Samza.
We are planning to move forward in SAMZA-552 and are more than happy to discuss it in much more details if you are interested. Thanks! -Yi On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <zcox...@gmail.com> wrote: > I'm curious - has anyone built any Samza-based systems that use any notion > of stream progress, e.g. low watermarks, punctuations, or heartbeats? These > are described in the stream-processing literature [1] [2] [3] and > implemented in MillWheel [4] and Dataflow [5] but I have not seen any > mention of these techniques related to Samza (except for briefly in > Samza-552 [6]). > > The purpose of something like a low watermark would include handling > out-of-order events, outputting the result of a stateful operation after > all relevant events have been processed, and cleaning up internal state > that will never again be updated to avoid unbounded growth. > > Just wondering if techniques like these would be useful in Samza job > pipelines, or if there are various approaches in Samza that make them > unnecessary. > > Thanks, > Zach > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390 > [2] http://dl.acm.org/citation.cfm?id=1055596 > [3] http://dl.acm.org/citation.cfm?id=1453890 > [4] http://research.google.com/pubs/pub41378.html > [5] http://research.google.com/pubs/pub43864.html > [6] https://issues.apache.org/jira/browse/SAMZA-552 >