[ 
https://issues.apache.org/jira/browse/HBASE-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082353#comment-17082353
 ] 

Mark Robert Miller commented on HBASE-24155:
--------------------------------------------

Still doing a little digging before I dump more info.

Basically, the more JVM's I run in parallel to make the tests faster, the more 
I hit this certain fail in a large variety of tests where the test times out.

Looking at resource usage, the only thing that seems to approach or exceed 
limits is the number of connections that end up in TIME_WAIT. It feels like 
some number of tests is creating a huge number of connections. If I ignore 
enough of the tests that end up hanging, I can run the remaining 95% of the 
tests in as many JVMs as I have RAM for. I'm narrowing down which tests are 
creating the most connections so that I can inspect them a little closer.

> When running the tests, a tremendous number of connections are put into 
> TIME_WAIT.
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-24155
>                 URL: https://issues.apache.org/jira/browse/HBASE-24155
>             Project: HBase
>          Issue Type: Test
>          Components: test
>            Reporter: Mark Robert Miller
>            Priority: Major
>
> When you run the test suite and monitor the number of connections in 
> TIME_WAIT, it appears that a very large number of connections do not end up 
> with a proper connection close lifecycle or perhaps proper reuse.
> Given connections can stay in TIME_WAIT from 1-4 minutes depending on OS/Env, 
> running the tests faster or with more tests in parallel increases the 
> TIME_WAIT connection buildup. Some tests spin up a very, very large number of 
> connections and if the wrong ones run at the same time, this can also greatly 
> increase the number of connections put into TIME_WAIT. This can have a 
> dramatic affect on performance (as it can take longer to create a new 
> connection) or flat out fail or timeout.
> In my experience, a much, much smaller number of connections in a test suite 
> would end up in TIME_WAIT when connection handling is all correct.
> Notes to come in comments below.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to