[ 
https://issues.apache.org/jira/browse/BEAM-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863721#comment-15863721
 ] 

Aljoscha Krettek commented on BEAM-1346:
----------------------------------------

[~kenn] another thing that crossed my mind is elements being pushed back due to 
their side input not being ready. Think {{PushbackSideInputRunner}} and similar 
implementations for other runners, if they have it. It's similar to this issue 
but in the end we probably need a separate issue.

The problem occurs when you have a special implementation for "combine" that 
doesn't simply do {{GroupByKey | ParDo(CombineFn)}} where the first one is 
{{GroupByKey: KV<K, V> → KV<K, List<V>>}}. The {{CombineFn}} can access side 
inputs and the side input that it can access is determined by the window that 
the value has after merging (as evident from the proper definition of combine 
given above). {{PushbackSideInputRunner}}, however, only considers the 
(proto-)window that the value has before merging so the pushing back and 
determining when a side input is ready is based on the wrong information.

Do you agree or is that just me getting a little paranoid with the whole 
merging stuff? ;-)

> Drop Late Data in ReduceFnRunner
> --------------------------------
>
>                 Key: BEAM-1346
>                 URL: https://issues.apache.org/jira/browse/BEAM-1346
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>    Affects Versions: 0.5.0
>            Reporter: Aljoscha Krettek
>
> I think these two commits recently broke late-data dropping for the Flink 
> Runner (and maybe for other runners as well):
> - https://github.com/apache/beam/commit/2b26ec8
> - https://github.com/apache/beam/commit/8989473
> It boils down to the {{LateDataDroppingDoFnRunner}} not being used anymore  
> because {{DoFnRunners.lateDataDroppingRunner()}} is not called anymore when a 
> {{DoFn}} is a {{ReduceFnExecutor}} (because that interface was removed).
> Maybe we should think about dropping late data in another place, my 
> suggestion is {{ReduceFnRunner}} but that's open for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to