[
https://issues.apache.org/jira/browse/BEAM-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863721#comment-15863721
]
Aljoscha Krettek commented on BEAM-1346:
----------------------------------------
[~kenn] another thing that crossed my mind is elements being pushed back due to
their side input not being ready. Think {{PushbackSideInputRunner}} and similar
implementations for other runners, if they have it. It's similar to this issue
but in the end we probably need a separate issue.
The problem occurs when you have a special implementation for "combine" that
doesn't simply do {{GroupByKey | ParDo(CombineFn)}} where the first one is
{{GroupByKey: KV<K, V> → KV<K, List<V>>}}. The {{CombineFn}} can access side
inputs and the side input that it can access is determined by the window that
the value has after merging (as evident from the proper definition of combine
given above). {{PushbackSideInputRunner}}, however, only considers the
(proto-)window that the value has before merging so the pushing back and
determining when a side input is ready is based on the wrong information.
Do you agree or is that just me getting a little paranoid with the whole
merging stuff? ;-)
> Drop Late Data in ReduceFnRunner
> --------------------------------
>
> Key: BEAM-1346
> URL: https://issues.apache.org/jira/browse/BEAM-1346
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Affects Versions: 0.5.0
> Reporter: Aljoscha Krettek
>
> I think these two commits recently broke late-data dropping for the Flink
> Runner (and maybe for other runners as well):
> - https://github.com/apache/beam/commit/2b26ec8
> - https://github.com/apache/beam/commit/8989473
> It boils down to the {{LateDataDroppingDoFnRunner}} not being used anymore
> because {{DoFnRunners.lateDataDroppingRunner()}} is not called anymore when a
> {{DoFn}} is a {{ReduceFnExecutor}} (because that interface was removed).
> Maybe we should think about dropping late data in another place, my
> suggestion is {{ReduceFnRunner}} but that's open for discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)