kennknowles commented on pull request #14268:
URL: https://github.com/apache/beam/pull/14268#issuecomment-813487929
> A high level question regarding to
>
> > it was added as response to a real user problem: sliding windows +
EARLIEST timestamp combiner delays downstream aggregations a lot
> > but when that problem occurred, timestamp combiner EARLIEST was the
default; with the change to how lateness is defined we changed the default to
END_OF_WINDOW
>
> With this PR, what will happen to the real user problem mentioned above?
Short answer: the real user problem will be back but not as bad.
Longer answer: EARLIEST used to be the default when `getOutputTime` was
introduced to solve the problem. Now the default is END_OF_WINDOW so the
problem is not as bad. To use sliding windows (or other overlapping windows)
with EARLIEST, the user has some choices:
- Just be OK with the delay of downstream GBK (already true for Python)
- Set up a non-default trigger (this will free watermark holds and allow
progress downstream)
- Manually move element timestamps forward with a `ParDo` before GBK,
achieving identical behavior
I think further discussion should continue on the dev@ thread instead of the
PR though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]