--> On Tue, Mar 7, 2017 at 10:46 PM, Bhupesh Chawda <bhup...@apache.org> wrote:
> Hi All, > > Watermark tuples in Apex are very tightly coupled to event time processing. > For this reason, usually they are modeled as having a timestamp. > > public interface WatermarkTuple > { > long getTimestamp(); > } > > Even though, watermarks are meant for such time related processing, I think > we should expand the concept of watermarks for the following types: > > 1. Labelled watermarks > This could be useful in scenarios where instead of a timestamp (which is an > ordered field), we have categorical values. For example, consider tuples > which are labeled by city names. For each city, we want to have separate > windows and isolate the processing. If the watermark returns a different > city name, we end the previous window and start a new one. Or, in this case > we could make use of both high and low watermarks indicating the start and > end of a city's data. This could mean having multiple windows' data > incoming at the same time. > > To me city looks like a key and you are trying to make the case that each key should have a separate watermark. That is the case discussed on the Flink/Beam list that David referred to. I think we should not mix the concepts of watermark and key. > 2. Ordered watermarks > Instead of having the ordered field as time, why not consider something > like an Ordered Watermark. TimeBased Watermarks could extend from that. > An ordered watermark could be used in case we have a sequence of data > tuples and we need to demarcate every n tuples. Even though we can say that > every n tuples the window is definitely closed, but the decision is made > only when the upstream sends the watermark tuple. The windowed operator > does not have any clue about it. It blindly opens and closes windows based > on watermarks received from upstream. This could mean different windows may > have different values of n. > > Please let me know your thoughts on this. > > Watermarks are already ordered and the state management is built based on that. Is your concern just the naming? Thanks, Thomas