For a pipeline that only uses data driven triggers, many of our supported use cases like window expiry calculation for GC and side input readiness fall apart and we have become reliant on a single monotonically increasing value.
On Tue, Jan 17, 2017 at 11:04 AM, Kenneth Knowles <k...@google.com.invalid> wrote: > On Fri, Jan 13, 2017 at 5:28 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > > > > I don't think the watermark itself is even really part of the Beam > > model, and other runners may implement things differently. > > > > This is an interesting perspective. I think it might be a bold claim about > a future that could be :-). Perhaps we don't _have_ to use a watermark to > describe the fact that a runner believes a window to be complete, and > perhaps even then any watermark-like structure doesn't have to be > communicable as a single Instant. > > On the other hand, all runners today share our triggering code and lateness > (https://s.apache.org/beam-lateness) is entirely defined in terms of > watermarks. In APIs, we have direct references such as > UnboundedSource#getWatermark() and Trigger#getWatermarkThatEnsuresFiring() > for awaiting side inputs and to a lesser extent the proposed > WindowMappingFn#maximumLookback() for calculating a watermark value that > allows GC. > > So I think for now it is part of the model, but currently isolated to > triggering, lateness, and resource management, not part of the question of > "what is being computed?" > > The way I see it, this proposal would add it as part of the "what...?" > aspect, so that is a major change to the model. Of course, in the absence > of fully automated retractions and/or other pane reconciliation, these two > are coupled anyhow. The new state and timer APIs blur the lines further as > they can be used to implement a lot of "trigger-like" functionality, and > timers add a lot of nondeterminism previously isolated to trigger into the > main computation. > > So, personally, I'm on the fence. I appreciate arguments for and against > giving a DoFn access to [a lower bound on] the input watermark. > > I'm enjoying this discussion and hope it continues with more use cases. > > Kenn >