Mark Shields created BEAM-241:
---------------------------------
Summary: Not easy for runners to get late-data dropping
Key: BEAM-241
URL: https://issues.apache.org/jira/browse/BEAM-241
Project: Beam
Issue Type: Bug
Components: runner-core
Reporter: Mark Shields
Assignee: Frances Perry
Quite by accident realized the Flink runner delegates to
GroupAlsoByWindowViaWindowSetDoFn for GBK, which in turn delegates to
ReduceFnRunner. The latter now assumes no messages will arrive beyond the
'garbage collection' time of their target window(s).
The Dataflow runner interposes a LateDataDroppingDoFnRunner into the path so as
to drop those too-late messages. That's done (I think) using
DoFnRunners.createDefault.
I don't think the Flink runner does that.
We need a nice runner-friendly way of dealing with the too-late data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)