Re: python AfterCount() trigger behavior not aligned with java

Leiyi Zhang Tue, 18 Aug 2020 13:34:53 -0700

Thank you Robert!

On Mon, Aug 17, 2020 at 5:52 PM Robert Bradshaw <[email protected]> wrote:


> Correct, everything is per-key. To allow triggering after n events you
> would have to given them all the same key. (Note that this would
> potentially introduce a bottleneck, as they would all be shuffled to
> the same machine.)
>
> On Mon, Aug 17, 2020 at 4:01 PM Leiyi Zhang <[email protected]> wrote:
> >
> > Thank you for explaining the details Robert!
> >
> > I do have 1 more question: is there a way in beam that allows triggering
> after n events arrived at the gbk step from previous parts of a pipeline?
> because afterCount() trigger for gbk is per-key, then it cannot be used
> here right?
> >
> > On Mon, Aug 17, 2020 at 3:22 PM Robert Bradshaw <[email protected]>
> wrote:
> >>
> >> Yeah, this is another subtlety. There's a notion of "window garbage
> >> collection" that's distinct from "window closing." Garbage collection
> >> happens regardless of whether the trigger was set iff the window is
> >> non-empty when the watermark + allowed lateness exceeds the end of
> >> window. (Well, there's a flag to control this behavior.) This has
> >> changed over time and I think you've uncovered a bug that the two SDKs
> >> are not consistent here.
> >>
> >> On Mon, Aug 17, 2020 at 2:29 PM Leiyi Zhang <[email protected]> wrote:
> >> >
> >> > is it just for the python sdk or for both python and java sdk?
> >> > seems like java sdk will output result even if there are less than 3
> elements per key.
> >> >
> >> > On Mon, Aug 17, 2020 at 2:20 PM Robert Bradshaw <[email protected]>
> wrote:
> >> >>
> >> >> Yes, GBK is non-determanistic in the face of triggers as well. All
> >> >> triggers are per-key, evaluated independently for each key, so it'd
> be
> >> >> "do I have at least 3 results for this key."
> >> >>
> >> >> On Mon, Aug 17, 2020 at 2:08 PM Leiyi Zhang <[email protected]>
> wrote:
> >> >> >
> >> >> > Thank you very much for the reply,
> >> >> > Is the result of gbk non-deterministic as well? between "do I have
> at least 3 results PER KEY" vs "do I have at least 3 incoming events before
> I trigger GBK"
> >> >> >
> >> >> > On Mon, Aug 17, 2020 at 1:39 PM Robert Bradshaw <
> [email protected]> wrote:
> >> >> >>
> >> >> >> Triggers in beam are non-determanistic; both behaviors are
> acceptable
> >> >> >> (especially for batch mode). In practice, production runners
> evaluate
> >> >> >> triggers (e.g. in this case "do I have at least two elements")
> whenver
> >> >> >> a new batch of data comes in (for the Python direct runner, in
> batch
> >> >> >> mode, all the data comes in at once). To have more control over
> this
> >> >> >> you can use TestPipeline, which will attempt to fire triggers as
> >> >> >> eagerly as possible.
> >> >> >>
> >> >> >> On Mon, Aug 17, 2020 at 1:16 PM Leiyi Zhang <[email protected]>
> wrote:
> >> >> >> >
> >> >> >> > for GBK wtih AfterCount(3), java sdk results in this and python
> sdk results in this
> >> >> >> >
> >> >> >> > for global count with aftercount(2), java sdk results in 2 2 2
> 2 and python sdk results in 8.
> >> >> >> >
> >> >> >> > On Mon, Aug 17, 2020 at 12:40 PM Robert Bradshaw <
> [email protected]> wrote:
> >> >> >> >>
> >> >> >> >> What results do you get in each?
> >> >> >> >>
> >> >> >> >> On Mon, Aug 17, 2020 at 11:55 AM Leiyi Zhang <
> [email protected]> wrote:
> >> >> >> >> >
> >> >> >> >> > Hi everyone!
> >> >> >> >> > I noticed that the behavior of AfterCount() trigger seems to
> be different between python sdk and the java one, so I created a few tests
> to show the difference, but in general I think the python sdk will buffer
> on result instead of input elements.
> >> >> >> >> >
> >> >> >> >> > What do you guys think?
> >> >> >> >> >
> >> >> >> >> > and here are the tests. I ran them in batch mode.
> >> >> >> >> >
> >> >> >> >> > Sincerely,
> >> >> >> >> > Leiyi
>

Re: python AfterCount() trigger behavior not aligned with java

Reply via email to