Re: Triggers still finish and drop all data

Jan Lukavský Thu, 31 Oct 2019 02:12:03 -0700

Hi Kenn,

does there still remain some use for trigger to finish? If we don't dropdata, would it still be of any use to users? If not, would it be betterto just remove the functionality completely, so that users who use it(and it will possibly break for them) are aware of it at compile time?


Jan

On 10/30/19 11:26 PM, Kenneth Knowles wrote:

Problem: a trigger can "finish" which causes a window to "close" anddrop all remaining data arriving for that window.
This has been discussed many times and I thought fixed, but it seemsto not be fixed. It does not seem to have its own Jira or thread thatI can find. But here are some pointers:
- data loss bug:https://lists.apache.org/thread.html/ce413231d0b7d52019668765186ef27a7ffb69b151fdb34f4bf80b0f@%3Cdev.beam.apache.org%3E - user hitting the bug:https://lists.apache.org/thread.html/28879bc80cd5c7ef1a3e38cb1d2c063165d40c13c02894bbccd66aca@%3Cuser.beam.apache.org%3E - user confusion:https://lists.apache.org/thread.html/2707aa449c8c6de1c6e3e8229db396323122304c14931c44d0081449@%3Cuser.beam.apache.org%3E - thread from 2016 on the topic:https://lists.apache.org/thread.html/5f44b62fdaf34094ccff8da2a626b7cd344d29a8a0fff6eac8e148ea@%3Cdev.beam.apache.org%3E
In theory, trigger finishing was intended for users who can get theiranswers from a smaller amount of data and then drop the rest. Inpractice, triggers aren't really expressive enough for this. StatefulDoFn is the solution for these cases.
I've opened https://github.com/apache/beam/pull/9942 which makes thefollowing changes:
- when a trigger says it is finished, it never fires again but datais still kept
 - at GC time the final output will be emitted
As with all bugfixes, this is backwards-incompatible (if your pipelinerelies on buggy behavior, it will stop working). So this is a majorchange that I wanted to discuss on dev@.
Kenn

Re: Triggers still finish and drop all data

Reply via email to