Thomas Groh created BEAM-1372:
---------------------------------
Summary: OutputTimeFn and Accumulating Mode is Confusing
Key: BEAM-1372
URL: https://issues.apache.org/jira/browse/BEAM-1372
Project: Beam
Issue Type: Bug
Components: beam-model
Reporter: Thomas Groh
See [here|
https://github.com/tgroh/beam/commit/2238df334a368ce1a41e14ee616be954c5430c73]
for an example pipeline
The Timestamp used by a pane does not change based on the accumulation mode of
the windowing strategy - as a result, elements which have associated timestamps
can not be safely reassigned to those timestamps after a GroupByKey if more
than one pane could have been produced, regardless of the {{OutputTimeFn}}. The
first example pipeline demonstrates two PCollections where the elements within
the last PCollection cannot be reassigned to their timestamps, even though we
are using {{OutputTimeFn#outputAtEarliestInputTimestamp}} and
When using a more complex windowing strategy like sessions, this is even more
confusing - a session that spans more than one of the downstream windows but
that is produced in multiple panes will over time be assigned to later and
later windows as more panes are produced - thus, a pipeline that produces
session windows and wishes to group the sessions by the point at which they
started must only ever produce a single pane per session.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)