[ https://issues.apache.org/jira/browse/HBASE-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113749#comment-17113749 ]
Mark Robert Miller commented on HBASE-24155: -------------------------------------------- It took me a bit longer, but I ended up tracking this down a bit further. Raising the socket cache size and expiration for hdfs had helped a fair amount, but there still 50% the number of sockets getting made, a lot of it I tracked to *ReplicationSourceWALReader* and it's reset to look for additional data to read. > When running the tests, a tremendous number of connections are put into > TIME_WAIT. > ---------------------------------------------------------------------------------- > > Key: HBASE-24155 > URL: https://issues.apache.org/jira/browse/HBASE-24155 > Project: HBase > Issue Type: Test > Components: test > Reporter: Mark Robert Miller > Priority: Major > > When you run the test suite and monitor the number of connections in > TIME_WAIT, it appears that a very large number of connections do not end up > with a proper connection close lifecycle or perhaps proper reuse. > Given connections can stay in TIME_WAIT from 1-4 minutes depending on OS/Env, > running the tests faster or with more tests in parallel increases the > TIME_WAIT connection buildup. Some tests spin up a very, very large number of > connections and if the wrong ones run at the same time, this can also greatly > increase the number of connections put into TIME_WAIT. This can have a > dramatic affect on performance (as it can take longer to create a new > connection) or flat out fail or timeout. > In my experience, a much, much smaller number of connections in a test suite > would end up in TIME_WAIT when connection handling is all correct. > Notes to come in comments below. -- This message was sent by Atlassian Jira (v8.3.4#803005)