[
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762574#comment-17762574
]
Matthias Pohl edited comment on FLINK-33053 at 9/7/23 5:58 AM:
---------------------------------------------------------------
Thanks for bringing this up, [~guoyangze]. I'm going to have a look at it. Just
having a brief look at the code, I suspect it to be a curator issue. Because
we're properly closing all the resources in the driver. But I have to
investigate further.
You didn't check, by any chance, whether this is also observable in 1.18
(because we upgraded curator to 5.4.0 in 1.18)?
was (Author: mapohl):
Thanks for bringing this up, [~guoyangze]. I'm going to have a look at it.
> Watcher leak in Zookeeper HA mode
> ---------------------------------
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0, 1.17.1
> Reporter: Yangze Guo
> Priority: Critical
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA
> mode. TM's watches on the leader of JobMaster has not been stopped after job
> finished.
> Here is how we re-produce this issue:
> - Start a session cluster and enable Zookeeper HA mode.
> - Continuously and concurrently submit short queries, e.g. WordCount to the
> cluster.
> - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)