Opened https://github.com/apache/beam/pull/9960 for this idea. This will alert users to broken pipelines and force them to alter them.
Kenn On Thu, Oct 31, 2019 at 2:12 PM Kenneth Knowles <k...@apache.org> wrote: > On Thu, Oct 31, 2019 at 2:11 AM Jan Lukavský <je...@seznam.cz> wrote: > >> Hi Kenn, >> >> does there still remain some use for trigger to finish? If we don't drop >> data, would it still be of any use to users? If not, would it be better >> to just remove the functionality completely, so that users who use it >> (and it will possibly break for them) are aware of it at compile time? >> >> Jan >> > > Good point. I believe there is no good use for a top-level trigger > finishing. As mentioned, the intended uses aren't really met by triggers, > but are met by stateful DoFn. > > Eugene's bug even has this title :-). We could not change any behavior but > just reject pipelines with broken top-level triggers. This is probably a > better solution. Because if a user has a broken trigger, the new behavior > is probably not enough to magically fix their pipeline. They are better off > knowing that they are broken and fixing it. > > And at that point, there is a lot of dead code and my PR is really just > cleaning it up as a simplification. > > Kenn > > > >> On 10/30/19 11:26 PM, Kenneth Knowles wrote: >> > Problem: a trigger can "finish" which causes a window to "close" and >> > drop all remaining data arriving for that window. >> > >> > This has been discussed many times and I thought fixed, but it seems >> > to not be fixed. It does not seem to have its own Jira or thread that >> > I can find. But here are some pointers: >> > >> > - data loss bug: >> > >> https://lists.apache.org/thread.html/ce413231d0b7d52019668765186ef27a7ffb69b151fdb34f4bf80b0f@%3Cdev.beam.apache.org%3E >> > - user hitting the bug: >> > >> https://lists.apache.org/thread.html/28879bc80cd5c7ef1a3e38cb1d2c063165d40c13c02894bbccd66aca@%3Cuser.beam.apache.org%3E >> > - user confusion: >> > >> https://lists.apache.org/thread.html/2707aa449c8c6de1c6e3e8229db396323122304c14931c44d0081449@%3Cuser.beam.apache.org%3E >> > - thread from 2016 on the topic: >> > >> https://lists.apache.org/thread.html/5f44b62fdaf34094ccff8da2a626b7cd344d29a8a0fff6eac8e148ea@%3Cdev.beam.apache.org%3E >> > >> > In theory, trigger finishing was intended for users who can get their >> > answers from a smaller amount of data and then drop the rest. In >> > practice, triggers aren't really expressive enough for this. Stateful >> > DoFn is the solution for these cases. >> > >> > I've opened https://github.com/apache/beam/pull/9942 which makes the >> > following changes: >> > >> > - when a trigger says it is finished, it never fires again but data >> > is still kept >> > - at GC time the final output will be emitted >> > >> > As with all bugfixes, this is backwards-incompatible (if your pipeline >> > relies on buggy behavior, it will stop working). So this is a major >> > change that I wanted to discuss on dev@. >> > >> > Kenn >> > >> >