Hi All,

Watermark tuples in Apex are very tightly coupled to event time processing.
For this reason, usually they are modeled as having a timestamp.

public interface WatermarkTuple
{
  long getTimestamp();
}

Even though, watermarks are meant for such time related processing, I think
we should expand the concept of watermarks for the following types:

1. Labelled watermarks
This could be useful in scenarios where instead of a timestamp (which is an
ordered field), we have categorical values. For example, consider tuples
which are labeled by city names. For each city, we want to have separate
windows and isolate the processing. If the watermark returns a different
city name, we end the previous window and start a new one. Or, in this case
we could make use of both high and low watermarks indicating the start and
end of a city's data. This could mean having multiple windows' data
incoming at the same time.

2. Ordered watermarks
Instead of having the ordered field as time, why not consider something
like an Ordered Watermark. TimeBased Watermarks could extend from that.
An ordered watermark could be used in case we have a sequence of data
tuples and we need to demarcate every n tuples. Even though we can say that
every n tuples the window is definitely closed, but the decision is made
only when the upstream sends the watermark tuple. The windowed operator
does not have any clue about it. It blindly opens and closes windows based
on watermarks received from upstream. This could mean different windows may
have different values of n.

Please let me know your thoughts on this.

~ Bhupesh

Reply via email to