Sam Whittle created BEAM-11034:
----------------------------------
Summary: State garbage collection timers set by Dataflow
SimpleParDoFn pile up for the GlobalWindow
Key: BEAM-11034
URL: https://issues.apache.org/jira/browse/BEAM-11034
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Sam Whittle
Assignee: Sam Whittle
If the dofn is stateful, garbage collection timers are set for the end of the
window plus allowed lateness:
https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491
For the global window this ends up setting garbage collection timers that will
only fire once the pipeline is drained. For pipelines that have constantly
newly arriving unique stateful keys, and otherwise cleanup their state
appropriately when triggering occurs, the # of timers builds up over time.
Example window and trigger, where the user has the opportunity to clean up
state for the key after at most a minute. However they have no control over
the timer set.
GlobalWindows()
.triggering(Repeatedly.forever(AfterFirst.of(
AfterPane.elementCountAtLeast(5000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);
--
This message was sent by Atlassian Jira
(v8.3.4#803005)