On Tue, Dec 5, 2017 at 12:00 PM, Kenneth Knowles <[email protected]> wrote:
> in the unbounded case I would like to be specified as having to agree with > the bounded spec for any finite prefix. I'm not sure if an operational view > is amenable to this. > It seems to me that the only way to reconcile the two views is with perfect watermarks. Historically, we've taken the position that watermarks are necessarily heuristic, but a change in perspective could yield gains in model simplicity. In particular, thinking of late data as an error and late triggers as error-handling code (which would typically just lead to an alert) would simplify pipeline logic significantly in cases where what to do with late data is non-obvious. At present, watermark-correctness is something that's not well-supported and over which we don't have much control, but the situation will likely improve as consistent streaming gains market share. Reconsidering the merits of heuristic vs perfect watermarks would be a natural thing to do as that happens. and then there's merging... > FWIW, your argument for windowing being an important property of a PCollection still holds for merging windows. It's just more complex because the analogy between windows and keys breaks down. The windows remain a property of the elements within the PCollection that affects the shape of any "tables" that will be computed with it.
