Hi Ajay, In the proposed scenario, the requirement is for the source to be the generator of more types of watermarks as well as the windowed operator to be cognizant of these. Such watermarks would also supply enough information depending on the watermark viz. labels, sequence numbers etc. Thus the windowed operator could work with watermarks which may be based on something other than time as well.
Implicit watermarks are needed in case we do not have the capability to generate the watermarks at the source. In such a case, the windowed operator could itself "assume" the watermarks as per pre-defined logic and act as if it was received from upstream. The proposed watermarks could be used as implicit variants as well since "implicit watermark" is an orthogonal case. ~ Bhupesh _______________________________________________________ Bhupesh Chawda E: bhup...@datatorrent.com | Twitter: @bhupeshsc www.datatorrent.com | apex.apache.org On Fri, Mar 10, 2017 at 5:51 PM, AJAY GUPTA <ajaygit...@gmail.com> wrote: > Hi Bhupesh, > > For point 1, cant we make use of implicitWatermarkGenerator? > > > Ajay > > On Wed, Mar 8, 2017 at 12:16 PM, Bhupesh Chawda <bhup...@apache.org> > wrote: > > > Hi All, > > > > Watermark tuples in Apex are very tightly coupled to event time > processing. > > For this reason, usually they are modeled as having a timestamp. > > > > public interface WatermarkTuple > > { > > long getTimestamp(); > > } > > > > Even though, watermarks are meant for such time related processing, I > think > > we should expand the concept of watermarks for the following types: > > > > 1. Labelled watermarks > > This could be useful in scenarios where instead of a timestamp (which is > an > > ordered field), we have categorical values. For example, consider tuples > > which are labeled by city names. For each city, we want to have separate > > windows and isolate the processing. If the watermark returns a different > > city name, we end the previous window and start a new one. Or, in this > case > > we could make use of both high and low watermarks indicating the start > and > > end of a city's data. This could mean having multiple windows' data > > incoming at the same time. > > > > 2. Ordered watermarks > > Instead of having the ordered field as time, why not consider something > > like an Ordered Watermark. TimeBased Watermarks could extend from that. > > An ordered watermark could be used in case we have a sequence of data > > tuples and we need to demarcate every n tuples. Even though we can say > that > > every n tuples the window is definitely closed, but the decision is made > > only when the upstream sends the watermark tuple. The windowed operator > > does not have any clue about it. It blindly opens and closes windows > based > > on watermarks received from upstream. This could mean different windows > may > > have different values of n. > > > > Please let me know your thoughts on this. > > > > ~ Bhupesh > > >