[ https://issues.apache.org/jira/browse/AIRAVATA-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029979#comment-16029979 ]
Marcus Christie commented on AIRAVATA-2321: ------------------------------------------- This might be a red herring, but there was a recent issue with Logstash that also generated lots of TIME_WAIT connections: https://lists.apache.org/thread.html/acb745986b563c0eaf500b7b5d07d8aaa592735e8e75ff9d57281227@%3Cdev.airavata.apache.org%3E It may or may not be relevant. In any case, it would be good to check Kafka/Logstash next time we get a lot of TIME_WAIT Zookeeper connections. > Thousands of Zookeeper client connections in TIME_WAIT > ------------------------------------------------------ > > Key: AIRAVATA-2321 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2321 > Project: Airavata > Issue Type: Bug > Components: Airavata Orchestrator, GFac > Affects Versions: 0.17 > Reporter: Marcus Christie > Assignee: Marcus Christie > > On gw56.iu.xsede.org, where the develop branch of airavata is deployed, there > are currently over 4,000 Zookeeper connections in TIME_WAIT state. > {noformat} > [airavata@gw56 ~]$ netstat -anp --tcp | grep 2181 | grep TIME_WAIT | wc -l > (Not all processes could be identified, non-owned process info > will not be shown, you would have to be root to see it all.) > 4758 > {noformat} > This number is fairly constant during the time I've been watching it. On > gw77.iu.xsede.org where the master branch is deployed, there are none of > these TIME_WAIT connections. > I looked into this a bit and wrote the following on HipChat > {quote} > [5:41 PM] Marcus Christie: From what I've been reading, I think the TIME_WAIT > problem must be coming from Zookeeper clients connecting and then closing > over and over again. > [5:42 PM] Marcus Christie: A TCP connection will stay in TIME_WAIT for about > 4 minutes after it is closed > http://stackoverflow.com/questions/10726049/what-is-the-reason-for-time-wait-connection-increasing-i... > [5:44 PM] Marcus Christie: There are consistently about 4,000 connections in > TIME_WAIT. If they hang around for 4 minutes (240 seconds), then that means > there must be 16.667 new connections being created (and eventually closed) > each second. > {quote} > Other things: > * [~smarru] already tried purging old logs, [see the Zookeeper > docs|https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_strengthsAndLimitations] > * Zookeeper has [some administrative > commands|https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands] > that are useful for finding out it's self-reported statistics about number > of connections, etc. > ** to run these do > {noformat} > telnet localhost 2181 > stat > {noformat} > * useful links on TIME_WAIT > ** > http://serverfault.com/questions/329845/how-to-forcibly-close-a-socket-in-time-wait > ** > http://stackoverflow.com/questions/10726049/what-is-the-reason-for-time-wait-connection-increasing-in-java > ** > http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)