[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719736#comment-16719736
 ] 

Michael Han commented on ZOOKEEPER-3211:
----------------------------------------

{quote}Have similar defects been solved in 3.4.13?
{quote}
Previously there were reports about CLOSE_WAIT, but if I remember correctly, 
most of those cases ended up no actions taken because it was hard to reproduce. 
{quote}It looks like zk Server is deadlocked
{quote}
The thread dump in 1.log file indicates some threads are blocked, but that 
seems a symptom rather than the cause. If we run out available sockets then 
some zookeeper threads that involves file IO / socket IO will be blocked. 

 
{quote}Does this cause CLOSE_WAIT for zk?
{quote}
Most of time, long living CLOSE_WAIT connections indicate an application side 
bug instead of kernel bug - that the connection should be closed but for some 
reasons the application, after receiving TCP reset from clients can't close the 
connection - which effectively leaks connections. The upgrade of kernel could 
be a trigger though. 

 

I am interested to know if any other folks can reproduce this. I currently 
don't have the environment to reproduce this.

 

Also, [~yss] can you please use zip file instead of rar file for uploading log 
files? 

Another thing to try is to increase your limit of open file descriptors - seems 
its currently set as 60? If you increase it (ulimit), you could end up still 
leaking connections but the server should be available before its running out 
of sockets.

> zookeeper standalone mode,found a high level bug in kernel of centos7.0 
> ,zookeeper Server's  tcp/ip socket connections(default 60 ) are CLOSE_WAIT 
> ,this lead to zk can't work for client any more
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3211
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3211
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: 1.zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel
> kernel:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
>            Reporter: yeshuangshuang
>            Priority: Blocker
>             Fix For: 3.4.5
>
>         Attachments: 1.log, zklog.rar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1.config--zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel version
> version:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
> 3.bug details:
> Occasionally,But the recurrence probability is extremely high. At first, the 
> read-write timeout takes about 6s, and after a few minutes, all connections 
> (including long ones) will be CLOSE_WAIT state.
> 4.:Circumvention scheme: it is found that all connections become close_wait 
> to restart the zookeeper server side actively



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to