[
https://issues.apache.org/jira/browse/BEAM-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boyuan Zhang reassigned BEAM-11417:
-----------------------------------
Assignee: Boyuan Zhang
> StreamingDataflowWorker can leak UnboundedSource finalization callbacks
> -----------------------------------------------------------------------
>
> Key: BEAM-11417
> URL: https://issues.apache.org/jira/browse/BEAM-11417
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Daniel Mills
> Assignee: Boyuan Zhang
> Priority: P1
>
> StreamingDataflowWorker keeps a map of finalization callbacks
> ([https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L401).]
> If the Dataflow service loses a callback ID (due to autoscaling etc; they
> are best-effort), the callback will stay around forever.
> This can cause a relatively rapid memory leak for sources like KafkaIO where
> the callback (the KafkaCheckpointMark) has a reference to the
> KafkaUnboundedReader object, which keeps a KafkaConsumer object alive.
> A simple fix would be to change the ConcurrentHashMap to a guava Cache with a
> timeout on elements.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)