On Fri, Jan 13, 2017 at 5:28 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > > I don't think the watermark itself is even really part of the Beam > model, and other runners may implement things differently. >
This is an interesting perspective. I think it might be a bold claim about a future that could be :-). Perhaps we don't _have_ to use a watermark to describe the fact that a runner believes a window to be complete, and perhaps even then any watermark-like structure doesn't have to be communicable as a single Instant. On the other hand, all runners today share our triggering code and lateness (https://s.apache.org/beam-lateness) is entirely defined in terms of watermarks. In APIs, we have direct references such as UnboundedSource#getWatermark() and Trigger#getWatermarkThatEnsuresFiring() for awaiting side inputs and to a lesser extent the proposed WindowMappingFn#maximumLookback() for calculating a watermark value that allows GC. So I think for now it is part of the model, but currently isolated to triggering, lateness, and resource management, not part of the question of "what is being computed?" The way I see it, this proposal would add it as part of the "what...?" aspect, so that is a major change to the model. Of course, in the absence of fully automated retractions and/or other pane reconciliation, these two are coupled anyhow. The new state and timer APIs blur the lines further as they can be used to implement a lot of "trigger-like" functionality, and timers add a lot of nondeterminism previously isolated to trigger into the main computation. So, personally, I'm on the fence. I appreciate arguments for and against giving a DoFn access to [a lower bound on] the input watermark. I'm enjoying this discussion and hope it continues with more use cases. Kenn