Hi all,

Yet another exciting corner of portability, discovered during debugging.
Some discussion at https://github.com/apache/beam/pull/14001 and
https://github.com/apache/beam/pull/13998

**In Java since around the beginning of Beam**
When a merging WindowFn goes through a GBK/Combine and windows are merged,
the downstream windowing is changed to "InvalidWindows" which will fail any
downstream GBK. The user is required to re-window before another GBK.

It was to protect a user from this:

1. User sets keys and chooses session windowing
2. User groups/combines by session
3. User computes the outputs to produce some new keys
4. User groups again

The result usually does not make sense. Because it was forbidden we never
decided whether things should merge again or not.

**In protos**
The MergeStatus enum has NON_MERGING, NEEDS_MERGE, and ALREADY_MERGED. It
is documented that ALREADY_MERGED is for sessions/merging windows after a
GBK.

This is _maybe_ better. It allows the windows to just be carried along. It
is a major model change and would require SDK support. But it might still
not make sense because the chances that two elements have exactly the same
merging window are very low for something like sessions. It may be useful
for advanced tricks with merging windows, but noone is doing that because
no SDK supports it.

**In Python**
I think nothing is done. The next GBK will merge again. I could be wrong -
I just read the code very quickly and don't know it that well.

**In Go**
I didn't even check. Maybe someone can add the status to the thread.

Kenn

Reply via email to