[ 
https://issues.apache.org/jira/browse/BEAM-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Qiu updated BEAM-11034:
-----------------------------
    Fix Version/s: 2.25.0

> State garbage collection timers set by Dataflow SimpleParDoFn pile up for the 
> GlobalWindow
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-11034
>                 URL: https://issues.apache.org/jira/browse/BEAM-11034
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>             Fix For: 2.25.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If the dofn is stateful, garbage collection timers are set for the end of the 
> window plus allowed lateness:
> https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491
> For the global window this ends up setting garbage collection timers that 
> will only fire once the pipeline is drained.  For pipelines that have 
> constantly newly arriving unique stateful keys, and otherwise cleanup their 
> state appropriately when triggering occurs, the # of timers builds up over 
> time.
> Example window and trigger, where the user has the opportunity to clean up 
> state for the key after at most a minute.  However they have no control over 
> the timer set.
> GlobalWindows()
> .triggering(Repeatedly.forever(AfterFirst.of(
> AfterPane.elementCountAtLeast(5000),
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
> nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to