[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ruel updated ZOOKEEPER-4428:
---------------------------------
    Description: 
In a production environment with some connectivity problems it was found the 
ZooKeeper server was using over 1000 threads with name "SyncThread" (that were 
never being freed).

Looking through the server logs indicates that these nodes were experiencing 
connection timeouts to the leader.

A test environment (described below in the "environment" field of this ticket) 
showed that these connection timeouts are what seem to be leaking these threads.

  was:
In a production environment with some connectivity problems it was found the 
ZooKeeper server was using over 1000 threads with name "SyncThread" (that were 
never being freed).

Looking through the server logs indicates that these nodes were experiencing 
connection timeouts to the leader.

A test environment (described below in the "environment" field of this ticket) 
showed that these connection timeouts are what seems to be leaking these 
threads.


> ZooKeeper leaks "SyncThread" threads when leadership connection times out and 
> is reestablished 
> -----------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4428
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.3
>         Environment: # On a follower node for an established ZooKeeper 
> ensemble, issue the following command to determine number of SyncThreads:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
>  # Issue the following IP tables command on the leader to drop traffic coming 
> from the follower used in Step 1:
>  iptables -A INPUT -s <Follower IP Address> -j DROP
>  # Watch the zookeeper logs on the nodes and wait for the connection to drop 
> due to timeout.
>  # Issue the following IP tables command on the leader to re-enable traffic 
> coming from follower used in Step 1:
> iptables -D INPUT -s <Follower IP Address> -j DROP
>  # Watch the zookeeper logs on the nodes and wait for the connection to the 
> leader to reestablish.
>  # On the follower node (used in Step 1), check the number of SyncThreads.  
> That value should have increased by one and stay pinned there indefinitely: 
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
>            Reporter: Ryan Ruel
>            Priority: Major
>
> In a production environment with some connectivity problems it was found the 
> ZooKeeper server was using over 1000 threads with name "SyncThread" (that 
> were never being freed).
> Looking through the server logs indicates that these nodes were experiencing 
> connection timeouts to the leader.
> A test environment (described below in the "environment" field of this 
> ticket) showed that these connection timeouts are what seem to be leaking 
> these threads.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to