Thomas Groh created BEAM-1372:
---------------------------------

             Summary: OutputTimeFn and Accumulating Mode is Confusing
                 Key: BEAM-1372
                 URL: https://issues.apache.org/jira/browse/BEAM-1372
             Project: Beam
          Issue Type: Bug
          Components: beam-model
            Reporter: Thomas Groh


See [here| 
https://github.com/tgroh/beam/commit/2238df334a368ce1a41e14ee616be954c5430c73] 
for an example pipeline

The Timestamp used by a pane does not change based on the accumulation mode of 
the windowing strategy - as a result, elements which have associated timestamps 
can not be safely reassigned to those timestamps after a GroupByKey if more 
than one pane could have been produced, regardless of the {{OutputTimeFn}}. The 
first example pipeline demonstrates two PCollections where the elements within 
the last PCollection cannot be reassigned to their timestamps, even though we 
are using {{OutputTimeFn#outputAtEarliestInputTimestamp}} and 

When using a more complex windowing strategy like sessions, this is even more 
confusing - a session that spans more than one of the downstream windows but 
that is produced in multiple panes will over time be assigned to later and 
later windows as more panes are produced - thus, a pipeline that produces 
session windows and wishes to group the sessions by the point at which they 
started must only ever produce a single pane per session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to