[
https://issues.apache.org/jira/browse/BEAM-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Qiu updated BEAM-11034:
-----------------------------
Status: Resolved (was: Open)
> State garbage collection timers set by Dataflow SimpleParDoFn pile up for the
> GlobalWindow
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-11034
> URL: https://issues.apache.org/jira/browse/BEAM-11034
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P2
> Fix For: 2.25.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> If the dofn is stateful, garbage collection timers are set for the end of the
> window plus allowed lateness:
> https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491
> For the global window this ends up setting garbage collection timers that
> will only fire once the pipeline is drained. For pipelines that have
> constantly newly arriving unique stateful keys, and otherwise cleanup their
> state appropriately when triggering occurs, the # of timers builds up over
> time.
> Example window and trigger, where the user has the opportunity to clean up
> state for the key after at most a minute. However they have no control over
> the timer set.
> GlobalWindows()
> .triggering(Repeatedly.forever(AfterFirst.of(
> AfterPane.elementCountAtLeast(5000),
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
> nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);
--
This message was sent by Atlassian Jira
(v8.3.4#803005)