Re: Watermark reading API

Kenneth Knowles Tue, 17 Jan 2017 11:05:36 -0800

On Fri, Jan 13, 2017 at 5:28 PM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
>
> I don't think the watermark itself is even really part of the Beam
> model, and other runners may implement things differently.
>


This is an interesting perspective. I think it might be a bold claim about
a future that could be :-). Perhaps we don't _have_ to use a watermark to
describe the fact that a runner believes a window to be complete, and
perhaps even then any watermark-like structure doesn't have to be
communicable as a single Instant.

On the other hand, all runners today share our triggering code and lateness
(https://s.apache.org/beam-lateness) is entirely defined in terms of
watermarks. In APIs, we have direct references such as
UnboundedSource#getWatermark() and Trigger#getWatermarkThatEnsuresFiring()
for awaiting side inputs and to a lesser extent the proposed
WindowMappingFn#maximumLookback() for calculating a watermark value that
allows GC.

So I think for now it is part of the model, but currently isolated to
triggering, lateness, and resource management, not part of the question of
"what is being computed?"

The way I see it, this proposal would add it as part of the "what...?"
aspect, so that is a major change to the model. Of course, in the absence
of fully automated retractions and/or other pane reconciliation, these two
are coupled anyhow. The new state and timer APIs blur the lines further as
they can be used to implement a lot of "trigger-like" functionality, and
timers add a lot of nondeterminism previously isolated to trigger into the
main computation.

So, personally, I'm on the fence. I appreciate arguments for and against
giving a DoFn access to [a lower bound on] the input watermark.

I'm enjoying this discussion and hope it continues with more use cases.

Kenn

Re: Watermark reading API

Reply via email to