Correct, everything is per-key. To allow triggering after n events you would have to given them all the same key. (Note that this would potentially introduce a bottleneck, as they would all be shuffled to the same machine.)
On Mon, Aug 17, 2020 at 4:01 PM Leiyi Zhang <[email protected]> wrote: > > Thank you for explaining the details Robert! > > I do have 1 more question: is there a way in beam that allows triggering > after n events arrived at the gbk step from previous parts of a pipeline? > because afterCount() trigger for gbk is per-key, then it cannot be used here > right? > > On Mon, Aug 17, 2020 at 3:22 PM Robert Bradshaw <[email protected]> wrote: >> >> Yeah, this is another subtlety. There's a notion of "window garbage >> collection" that's distinct from "window closing." Garbage collection >> happens regardless of whether the trigger was set iff the window is >> non-empty when the watermark + allowed lateness exceeds the end of >> window. (Well, there's a flag to control this behavior.) This has >> changed over time and I think you've uncovered a bug that the two SDKs >> are not consistent here. >> >> On Mon, Aug 17, 2020 at 2:29 PM Leiyi Zhang <[email protected]> wrote: >> > >> > is it just for the python sdk or for both python and java sdk? >> > seems like java sdk will output result even if there are less than 3 >> > elements per key. >> > >> > On Mon, Aug 17, 2020 at 2:20 PM Robert Bradshaw <[email protected]> >> > wrote: >> >> >> >> Yes, GBK is non-determanistic in the face of triggers as well. All >> >> triggers are per-key, evaluated independently for each key, so it'd be >> >> "do I have at least 3 results for this key." >> >> >> >> On Mon, Aug 17, 2020 at 2:08 PM Leiyi Zhang <[email protected]> wrote: >> >> > >> >> > Thank you very much for the reply, >> >> > Is the result of gbk non-deterministic as well? between "do I have at >> >> > least 3 results PER KEY" vs "do I have at least 3 incoming events >> >> > before I trigger GBK" >> >> > >> >> > On Mon, Aug 17, 2020 at 1:39 PM Robert Bradshaw <[email protected]> >> >> > wrote: >> >> >> >> >> >> Triggers in beam are non-determanistic; both behaviors are acceptable >> >> >> (especially for batch mode). In practice, production runners evaluate >> >> >> triggers (e.g. in this case "do I have at least two elements") whenver >> >> >> a new batch of data comes in (for the Python direct runner, in batch >> >> >> mode, all the data comes in at once). To have more control over this >> >> >> you can use TestPipeline, which will attempt to fire triggers as >> >> >> eagerly as possible. >> >> >> >> >> >> On Mon, Aug 17, 2020 at 1:16 PM Leiyi Zhang <[email protected]> wrote: >> >> >> > >> >> >> > for GBK wtih AfterCount(3), java sdk results in this and python sdk >> >> >> > results in this >> >> >> > >> >> >> > for global count with aftercount(2), java sdk results in 2 2 2 2 and >> >> >> > python sdk results in 8. >> >> >> > >> >> >> > On Mon, Aug 17, 2020 at 12:40 PM Robert Bradshaw >> >> >> > <[email protected]> wrote: >> >> >> >> >> >> >> >> What results do you get in each? >> >> >> >> >> >> >> >> On Mon, Aug 17, 2020 at 11:55 AM Leiyi Zhang <[email protected]> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > Hi everyone! >> >> >> >> > I noticed that the behavior of AfterCount() trigger seems to be >> >> >> >> > different between python sdk and the java one, so I created a few >> >> >> >> > tests to show the difference, but in general I think the python >> >> >> >> > sdk will buffer on result instead of input elements. >> >> >> >> > >> >> >> >> > What do you guys think? >> >> >> >> > >> >> >> >> > and here are the tests. I ran them in batch mode. >> >> >> >> > >> >> >> >> > Sincerely, >> >> >> >> > Leiyi
