[
https://issues.apache.org/jira/browse/ZOOKEEPER-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Ruel updated ZOOKEEPER-4428:
---------------------------------
Environment:
# On a follower node for an established ZooKeeper ensemble, issue the following
command to determine number of SyncThreads:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
# Issue the following IP tables command on the leader to drop traffic coming
from the follower used in Step 1:
iptables -A INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to drop
due to timeout.
# Issue the following IP tables command on the leader to re-enable traffic
coming from follower used in Step 1:
iptables -D INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to the
leader to reestablish.
# On the follower node (used in Step 1), check the number of SyncThreads.
That value should have increased by one and stay pinned there indefinitely:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
was:
# On a follower node for an established ZooKeeper ensemble, issue the following
command to determine number of SyncThreads:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
# Issue the following IP tables command on the leader to drop traffic coming
from the follower used in Step 1:
iptables -A INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to drop
due to timeout.
# Issue the following IP tables command on the leader to re-enable traffic
coming from follower used in Step 1:
iptables -D INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to the
leader to reestablish.
# On the follower node (used in Step 1), check the number of SyncThreads.
That value should have increased by one and stay pinned there indefinitely:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> ZooKeeper leaks "SyncThread" threads when leadership connection times out and
> is reestablished
> -----------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4428
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.3
> Environment: # On a follower node for an established ZooKeeper
> ensemble, issue the following command to determine number of SyncThreads:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> # Issue the following IP tables command on the leader to drop traffic coming
> from the follower used in Step 1:
> iptables -A INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to drop
> due to timeout.
> # Issue the following IP tables command on the leader to re-enable traffic
> coming from follower used in Step 1:
> iptables -D INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to the
> leader to reestablish.
> # On the follower node (used in Step 1), check the number of SyncThreads.
> That value should have increased by one and stay pinned there indefinitely:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> Reporter: Ryan Ruel
> Priority: Major
>
> In a production environment with some connectivity problems it was found the
> ZooKeeper server was using over 1000 threads with name "SyncThread" (that
> were never being freed).
> Looking through the server logs indicates that these nodes were experiencing
> connection timeouts to the leader.
> A test environment (described below in the "environment" field of this
> ticket) showed that these connection timeouts are what seem to be leaking
> these threads.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)