[
https://issues.apache.org/jira/browse/BEAM-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127927#comment-16127927
]
Kenneth Knowles commented on BEAM-2671:
---------------------------------------
[~staslev] previously, timers were each processed separately. The state for a
window was cleared when a timer arrived with timestamp == GC time. This was
actually wrong relative to the spec, and led to the wrong number of outputs in
some situations where the end-of-window and GC timer came in the same bundle.
You'd get multiple outputs (which is OK but wasteful) but they would be labeled
wrong about whether they are the first or final output.
Now, timers for a window's lifecycle are processed all at once. In fact, the
timers themselves are irrelevant. When they arrive, the window is "activated"
and the state for the window is cleared if the watermark is far enough along
that the window is expired. All the PaneInfo is automatically fixed for the
corner cases about when the EOW and GC timers come together or very delayed,
etc.
Technically, the GC timer should only ever be delivered when the watermark is
that far along, so the actual GC time is the same.
If a runner was delivering the GC timer early, then it would have worked in the
old logic, but won't GC in the new logic. If a timer comes in with timestamp ==
GC time but the watermark is actually not far enough along to safely GC, it
will not cause a GC. It would be a bug to deliver that timer, and could cause
other erroneous results due to early clearing of state - data would come in,
would not be dropped, and would then be output. But mostly it would be rare to
see the failure. Now you'd see it all the time.
> CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark
> runner
> --------------------------------------------------------------------------------
>
> Key: BEAM-2671
> URL: https://issues.apache.org/jira/browse/BEAM-2671
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Etienne Chauchot
> Assignee: Jean-Baptiste Onofré
> Fix For: 2.2.0
>
>
> Error message:
> Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0:
> Expected: iterable over [] in any order
> but: Not matched: "late"
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)