Hi Yi, I read through the SAMZA-552 design [1] and have some questions: is the low-watermark concept included in the Window Metadata? Does the window operator compute the low-watermark from timestamps on incoming events?
Thanks, Zach [1] https://issues.apache.org/jira/secure/attachment/12708934/DESIGN-SAMZA-552-7.pdf On Wed, Jan 13, 2016 at 12:23 AM Yi Pan <nickpa...@gmail.com> wrote: > Hi, Zach, > > Glad that you pointed it out! Actually, the design description in SAMZA-552 > has adopt a lot of flavors of high-watermark, late-arrivals, from > MillWheel. The terms used in the design doc maybe different since the terms > in the doc were used earlier than we discovered the MillWheel presentation. > But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing technique > described there) is targeted to implement those concepts of > high-watermark/late-arrivals in Samza. > > We are planning to move forward in SAMZA-552 and are more than happy to > discuss it in much more details if you are interested. > > Thanks! > > -Yi > > On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <zcox...@gmail.com> wrote: > > > I'm curious - has anyone built any Samza-based systems that use any > notion > > of stream progress, e.g. low watermarks, punctuations, or heartbeats? > These > > are described in the stream-processing literature [1] [2] [3] and > > implemented in MillWheel [4] and Dataflow [5] but I have not seen any > > mention of these techniques related to Samza (except for briefly in > > Samza-552 [6]). > > > > The purpose of something like a low watermark would include handling > > out-of-order events, outputting the result of a stateful operation after > > all relevant events have been processed, and cleaning up internal state > > that will never again be updated to avoid unbounded growth. > > > > Just wondering if techniques like these would be useful in Samza job > > pipelines, or if there are various approaches in Samza that make them > > unnecessary. > > > > Thanks, > > Zach > > > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390 > > [2] http://dl.acm.org/citation.cfm?id=1055596 > > [3] http://dl.acm.org/citation.cfm?id=1453890 > > [4] http://research.google.com/pubs/pub41378.html > > [5] http://research.google.com/pubs/pub43864.html > > [6] https://issues.apache.org/jira/browse/SAMZA-552 > > >