[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233292#comment-17233292
 ] 

maoling commented on ZOOKEEPER-4003:
------------------------------------

[~xiaotong.wang] Thanks for reporting more detailed context.

1. For jmap -histo:live

We can find that your zk server has 48786 *NettyServerCnxn* instances(every 
ServerCnxn represents one connection from client to server). This is the most 
suspicious place. So I can make sure the server had endured too many 
connections he can hold.

2. For too many CLOSE_WAIT:
That means the server has closed the connection actively and client doesn't 
close that connection. Could you please check your application codes to find 
whether you forget to close the zk client somewhere.

> Zookeeper server breakdown Frequently
> -------------------------------------
>
>                 Key: ZOOKEEPER-4003
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4003
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.1
>         Environment: zookeeper verison 3.5.1
>  
>            Reporter: xiaotong.wang
>            Priority: Blocker
>         Attachments: image-2020-11-13-16-51-11-960.png, 
> image-2020-11-14-18-08-18-126.png, image-2020-11-14-18-10-14-384.png, 
> jmap.PNG, screenshot-1.png
>
>
> *error log* 
> WARN [New I/O worker #16:NettyServerCnxn@400] - Closing connection to 
> /x.x.x.x:43766
> java.io.IOException: ZK down
>  at 
> org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:337)
>  at 
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:243)
>  at 
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:165)
>  at 
> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>  at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
>  
> and 
>  
>  
> [myid:2] - WARN [New I/O worker 
> #15:NettyServerCnxnFactory$CnxnChannelHandler@141] - Exception caught [id: 
> 0x9ba504cb, /x.x.x.x:39780 :> /x.x.x.x:2181] EXCEPTION: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
>  at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
>  at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>  at 
> org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:151)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:292)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>  at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  netstat -an|grep 2181|grep CLOSE_WAIT|wc -l
> *28441*
>  
> sample:
> !image-2020-11-13-16-51-11-960.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to