[
https://issues.apache.org/jira/browse/ZOOKEEPER-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Ruel updated ZOOKEEPER-4428:
---------------------------------
Summary: ZooKeeper Server leaks "SyncThread" threads when leadership
connection times out and is reestablished (was: ZooKeeper leaks "SyncThread"
threads when leadership connection times out and is reestablished )
> ZooKeeper Server leaks "SyncThread" threads when leadership connection times
> out and is reestablished
> ------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4428
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.3
> Environment: # On a follower node for an established ZooKeeper
> ensemble, issue the following command to determine number of SyncThreads:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> # Issue the following IP tables command on the leader to drop traffic coming
> from the follower used in Step 1:
> iptables -A INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to drop
> due to timeout.
> # Issue the following IP tables command on the leader to re-enable traffic
> coming from follower used in Step 1:
> iptables -D INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to the
> leader to reestablish.
> # On the follower node (used in Step 1), check the number of SyncThreads.
> That value should have increased by one and stay pinned there indefinitely:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> Reporter: Ryan Ruel
> Priority: Major
>
> In a production environment with some connectivity problems it was found the
> ZooKeeper server was using over 1000 threads with name "SyncThread" (that
> were never being freed).
> Looking through the server logs indicates that these nodes were experiencing
> connection timeouts to the leader.
> A test environment (described below in the "environment" field of this
> ticket) showed that these connection timeouts are what seem to be leaking
> these threads.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)