[
https://issues.apache.org/jira/browse/BEAM-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849043#comment-15849043
]
Kenneth Knowles commented on BEAM-1372:
---------------------------------------
When OutputTimeFn is removed and replaced with an enum, aka no fixed interface,
then it will be natural to tweak random bits of behavior and we can have enums
that express broader behavior, like holding to the MIN of data across all
panes, and we could also reject pipelines with unreasonable combinations.
> OutputTimeFn and Accumulating Mode is Confusing
> -----------------------------------------------
>
> Key: BEAM-1372
> URL: https://issues.apache.org/jira/browse/BEAM-1372
> Project: Beam
> Issue Type: Bug
> Components: beam-model
> Reporter: Thomas Groh
>
> See [here|
> https://github.com/tgroh/beam/commit/2238df334a368ce1a41e14ee616be954c5430c73]
> for an example pipeline
> The Timestamp used by a pane does not change based on the accumulation mode
> of the windowing strategy - as a result, elements which have associated
> timestamps can not be safely reassigned to those timestamps after a
> GroupByKey if more than one pane could have been produced, regardless of the
> {{OutputTimeFn}}. The first example pipeline demonstrates two PCollections
> where the elements within the last PCollection cannot be reassigned to their
> timestamps, even though we are using
> {{OutputTimeFn#outputAtEarliestInputTimestamp}} and
> When using a more complex windowing strategy like sessions, this is even more
> confusing - a session that spans more than one of the downstream windows but
> that is produced in multiple panes will over time be assigned to later and
> later windows as more panes are produced - thus, a pipeline that produces
> session windows and wishes to group the sessions by the point at which they
> started must only ever produce a single pane per session.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)