Hi All, Watermark tuples in Apex are very tightly coupled to event time processing. For this reason, usually they are modeled as having a timestamp.
public interface WatermarkTuple { long getTimestamp(); } Even though, watermarks are meant for such time related processing, I think we should expand the concept of watermarks for the following types: 1. Labelled watermarks This could be useful in scenarios where instead of a timestamp (which is an ordered field), we have categorical values. For example, consider tuples which are labeled by city names. For each city, we want to have separate windows and isolate the processing. If the watermark returns a different city name, we end the previous window and start a new one. Or, in this case we could make use of both high and low watermarks indicating the start and end of a city's data. This could mean having multiple windows' data incoming at the same time. 2. Ordered watermarks Instead of having the ordered field as time, why not consider something like an Ordered Watermark. TimeBased Watermarks could extend from that. An ordered watermark could be used in case we have a sequence of data tuples and we need to demarcate every n tuples. Even though we can say that every n tuples the window is definitely closed, but the decision is made only when the upstream sends the watermark tuple. The windowed operator does not have any clue about it. It blindly opens and closes windows based on watermarks received from upstream. This could mean different windows may have different values of n. Please let me know your thoughts on this. ~ Bhupesh