Sam Whittle created BEAM-11034:
----------------------------------

             Summary: State garbage collection timers set by Dataflow 
SimpleParDoFn pile up for the GlobalWindow
                 Key: BEAM-11034
                 URL: https://issues.apache.org/jira/browse/BEAM-11034
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Sam Whittle
            Assignee: Sam Whittle


If the dofn is stateful, garbage collection timers are set for the end of the 
window plus allowed lateness:
https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491

For the global window this ends up setting garbage collection timers that will 
only fire once the pipeline is drained.  For pipelines that have constantly 
newly arriving unique stateful keys, and otherwise cleanup their state 
appropriately when triggering occurs, the # of timers builds up over time.

Example window and trigger, where the user has the opportunity to clean up 
state for the key after at most a minute.  However they have no control over 
the timer set.

GlobalWindows()
.triggering(Repeatedly.forever(AfterFirst.of(
AfterPane.elementCountAtLeast(5000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to