[ 
https://issues.apache.org/jira/browse/BEAM-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang reassigned BEAM-11417:
-----------------------------------

    Assignee: Boyuan Zhang

> StreamingDataflowWorker can leak UnboundedSource finalization callbacks
> -----------------------------------------------------------------------
>
>                 Key: BEAM-11417
>                 URL: https://issues.apache.org/jira/browse/BEAM-11417
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Daniel Mills
>            Assignee: Boyuan Zhang
>            Priority: P1
>
> StreamingDataflowWorker keeps a map of finalization callbacks 
> ([https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L401).]
>   If the Dataflow service loses a callback ID (due to autoscaling etc; they 
> are best-effort), the callback will stay around forever.
> This can cause a relatively rapid memory leak for sources like KafkaIO where 
> the callback (the KafkaCheckpointMark) has a reference to the 
> KafkaUnboundedReader object, which keeps a KafkaConsumer object alive.
> A simple fix would be to change the ConcurrentHashMap to a guava Cache with a 
> timeout on elements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to