On Tue, Dec 5, 2017 at 12:00 PM, Kenneth Knowles <[email protected]> wrote:

> in the unbounded case I would like to be specified as having to agree with
> the bounded spec for any finite prefix. I'm not sure if an operational view
> is amenable to this.
>
It seems to me that the only way to reconcile the two views is with perfect
watermarks. Historically, we've taken the position that watermarks are
necessarily heuristic, but a change in perspective could yield gains in
model simplicity. In particular, thinking of late data as an error and late
triggers as error-handling code (which would typically just lead to an
alert) would simplify pipeline logic significantly in cases where what to
do with late data is non-obvious. At present, watermark-correctness is
something that's not well-supported and over which we don't have much
control, but the situation will likely improve as consistent streaming
gains market share. Reconsidering the merits of heuristic vs perfect
watermarks would be a natural thing to do as that happens.

and then there's merging...
>
FWIW, your argument for windowing being an important property of a
PCollection still holds for merging windows. It's just more complex because
the analogy between windows and keys breaks down. The windows remain a
property of the elements within the PCollection that affects the shape of
any "tables" that will be computed with it.

Reply via email to