Problem: a trigger can "finish" which causes a window to "close" and drop all remaining data arriving for that window.
This has been discussed many times and I thought fixed, but it seems to not be fixed. It does not seem to have its own Jira or thread that I can find. But here are some pointers: - data loss bug: https://lists.apache.org/thread.html/ce413231d0b7d52019668765186ef27a7ffb69b151fdb34f4bf80b0f@%3Cdev.beam.apache.org%3E - user hitting the bug: https://lists.apache.org/thread.html/28879bc80cd5c7ef1a3e38cb1d2c063165d40c13c02894bbccd66aca@%3Cuser.beam.apache.org%3E - user confusion: https://lists.apache.org/thread.html/2707aa449c8c6de1c6e3e8229db396323122304c14931c44d0081449@%3Cuser.beam.apache.org%3E - thread from 2016 on the topic: https://lists.apache.org/thread.html/5f44b62fdaf34094ccff8da2a626b7cd344d29a8a0fff6eac8e148ea@%3Cdev.beam.apache.org%3E In theory, trigger finishing was intended for users who can get their answers from a smaller amount of data and then drop the rest. In practice, triggers aren't really expressive enough for this. Stateful DoFn is the solution for these cases. I've opened https://github.com/apache/beam/pull/9942 which makes the following changes: - when a trigger says it is finished, it never fires again but data is still kept - at GC time the final output will be emitted As with all bugfixes, this is backwards-incompatible (if your pipeline relies on buggy behavior, it will stop working). So this is a major change that I wanted to discuss on dev@. Kenn