[
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762680#comment-17762680
]
Matthias Pohl edited comment on FLINK-33053 at 9/7/23 9:56 AM:
---------------------------------------------------------------
FLINK-29813 is already covering the migration to {{CuratorCache}}. I haven't
had a chance to look into it, yet. We need to do an analysis whether switching
would cause some side effects. But we still might want to understand where the
thread leaking is coming from.
was (Author: mapohl):
FLINK-29813 is already covering the migration to {{CuratorCache}}. I haven't
had a chance to look into it, yet. We need to do an analysis whether switching
would cause some side effects. But we still might want to understand where the
thread leaking is coming from.
> Watcher leak in Zookeeper HA mode
> ---------------------------------
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0, 1.17.1
> Reporter: Yangze Guo
> Priority: Critical
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA
> mode. TM's watches on the leader of JobMaster has not been stopped after job
> finished.
> Here is how we re-produce this issue:
> - Start a session cluster and enable Zookeeper HA mode.
> - Continuously and concurrently submit short queries, e.g. WordCount to the
> cluster.
> - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)