[ 
https://issues.apache.org/jira/browse/BEAM-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566305#comment-15566305
 ] 

Kenneth Knowles commented on BEAM-696:
--------------------------------------

I think there's some conflation of elements coming out of {{GroupByKey}} and 
bundles. These are actually different in important ways, so I just want to 
separate the terminology. What I was saying only applies because a grouped 
{{KV<K, Iterable<V>>}} coming out of GBK is a single indivisible element, which 
is then processed atomically via {{Combine.GroupedValues}}.

The issue here, as I see it, is what optimizations are applicable to this 
execution. So your point 2, which is I think can also be seen as Aljoscha and 
Pei's point, is that the case of merging windows and side inputs precludes the 
optimization of combining up front, because it is required that the CombineFn 
sees just one value for the side input, and that it corresponds to the window 
in which output is produced.

In other words, "buffer until a trigger is ready and then combine with reading 
of side inputs" is a correct implementation in this case.

There are other subtleties having to do with accumulating mode and speculative 
triggers. The point to emphasize is that the semantics are {{GroupByKey}} 
followed by {{Combine.GroupedValues}} and the rest is optimizations.

> Side-Inputs non-deterministic with merging main-input windows
> -------------------------------------------------------------
>
>                 Key: BEAM-696
>                 URL: https://issues.apache.org/jira/browse/BEAM-696
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>            Reporter: Ben Chambers
>            Assignee: Pei He
>
> Side-Inputs are non-deterministic for several reasons:
> 1. Because they depend on triggering of the side-input (this is acceptable 
> because triggers are by their nature non-deterministic).
> 2. They depend on the current state of the main-input window in order to 
> lookup the side-input. This means that with merging
> 3. Any runner optimizations that affect when the side-input is looked up may 
> cause problems with either or both of these.
> This issue focuses on #2 -- the non-determinism of side-inputs that execute 
> within a Merging WindowFn.
> Possible solution would be to defer running anything that looks up the 
> side-input until we need to extract an output, and using the main-window at 
> that point. Specifically, if the main-window is a MergingWindowFn, don't 
> execute any kind of pre-combine, instead buffer all the inputs and combine 
> later.
> This could still run into some non-determinism if there are triggers 
> controlling when we extract output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to