[
https://issues.apache.org/jira/browse/BEAM-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268405#comment-15268405
]
Aljoscha Krettek commented on BEAM-241:
---------------------------------------
I see, in there you are also relying on the fact that
{{DoFnRunners.createDefault}} wraps it in a {{LataDataDroppingDoFnRunner}}.
I'm curious about how the shuffle-by-key-and-window will work? How will this
work with merging windows, i.e. session windows. (Maybe this is going to far
off-topic and we should take it to the ML.)
> Not easy for runners to get late-data dropping
> ----------------------------------------------
>
> Key: BEAM-241
> URL: https://issues.apache.org/jira/browse/BEAM-241
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Reporter: Mark Shields
> Assignee: Frances Perry
>
> Quite by accident realized the Flink runner delegates to
> GroupAlsoByWindowViaWindowSetDoFn for GBK, which in turn delegates to
> ReduceFnRunner. The latter now assumes no messages will arrive beyond the
> 'garbage collection' time of their target window(s).
> The Dataflow runner interposes a LateDataDroppingDoFnRunner into the path so
> as to drop those too-late messages. That's done (I think) using
> DoFnRunners.createDefault.
> I don't think the Flink runner does that.
> We need a nice runner-friendly way of dealing with the too-late data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)