Hi Ajay,

In the proposed scenario, the requirement is for the source to be the
generator of more types of watermarks as well as the windowed operator to
be cognizant of these. Such watermarks would also supply enough information
depending on the watermark viz. labels, sequence numbers etc. Thus the
windowed operator could work with watermarks which may be based on
something other than time as well.

Implicit watermarks are needed in case we do not have the capability to
generate the watermarks at the source. In such a case, the windowed
operator could itself "assume" the watermarks as per pre-defined logic and
act as if it was received from upstream. The proposed watermarks could be
used as implicit variants as well since "implicit watermark" is an
orthogonal case.

~ Bhupesh




_______________________________________________________

Bhupesh Chawda

E: bhup...@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Fri, Mar 10, 2017 at 5:51 PM, AJAY GUPTA <ajaygit...@gmail.com> wrote:

> Hi Bhupesh,
>
> For point 1, cant we make use of implicitWatermarkGenerator?
>
>
> Ajay
>
> On Wed, Mar 8, 2017 at 12:16 PM, Bhupesh Chawda <bhup...@apache.org>
> wrote:
>
> > Hi All,
> >
> > Watermark tuples in Apex are very tightly coupled to event time
> processing.
> > For this reason, usually they are modeled as having a timestamp.
> >
> > public interface WatermarkTuple
> > {
> >   long getTimestamp();
> > }
> >
> > Even though, watermarks are meant for such time related processing, I
> think
> > we should expand the concept of watermarks for the following types:
> >
> > 1. Labelled watermarks
> > This could be useful in scenarios where instead of a timestamp (which is
> an
> > ordered field), we have categorical values. For example, consider tuples
> > which are labeled by city names. For each city, we want to have separate
> > windows and isolate the processing. If the watermark returns a different
> > city name, we end the previous window and start a new one. Or, in this
> case
> > we could make use of both high and low watermarks indicating the start
> and
> > end of a city's data. This could mean having multiple windows' data
> > incoming at the same time.
> >
> > 2. Ordered watermarks
> > Instead of having the ordered field as time, why not consider something
> > like an Ordered Watermark. TimeBased Watermarks could extend from that.
> > An ordered watermark could be used in case we have a sequence of data
> > tuples and we need to demarcate every n tuples. Even though we can say
> that
> > every n tuples the window is definitely closed, but the decision is made
> > only when the upstream sends the watermark tuple. The windowed operator
> > does not have any clue about it. It blindly opens and closes windows
> based
> > on watermarks received from upstream. This could mean different windows
> may
> > have different values of n.
> >
> > Please let me know your thoughts on this.
> >
> > ~ Bhupesh
> >
>

Reply via email to