[ 
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034825#comment-15034825
 ] 

Ariel Weisberg commented on CASSANDRA-10730:
--------------------------------------------

Great. I am still thinking on this. A 1 gig heap is pretty big and the RSS in 
top wasn't at 1 gig. It makes me think that it might not be GC after all. I am 
wondering if it even gets to the point that it would accept client connections. 
We can't use the stacks to tell because accept for clients is done non-blocking 
with Netty.

Mabye it is accepting connections on the socket, but then throws an exception 
in {{Server.Initializer}} that Netty is swallowing.

Are the log files from the servers that were part of the test collected? [From 
this build I don't see a place to get the generated 
artifacts.|http://cassci.datastax.com/job/mambocab-cassandra-3.0-dtest/10/#showFailuresLink]

> periodic timeout errors in dtest
> --------------------------------
>
>                 Key: CASSANDRA-10730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>            Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1': 
> OperationTimedOut('errors=Timed out creating connection (10 seconds), 
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing 
> this way has increased (it feels like it has). From there we can bisect over 
> the dtests, ccm, or C*, depending on what looks like the source of the 
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes 
> start but don't successfully make the CQL port available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to