[ 
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025096#comment-15025096
 ] 

Jim Witschey commented on CASSANDRA-10730:
------------------------------------------

That's a good idea too. I'll try that after the current {{netstat}} job runs.

I ran a job running {{netstat}} after each unsuccessful connection attempt. 
Here's an example of a connection timeout failure:

http://cassci.datastax.com/job/mambocab-cassandra-3.0-dtest/3/testReport/cql_tests/StorageProxyCQLTester/user_test/

Here's what the output looks like after each connection attempt:

{code}
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      
-               
tcp        0      0 127.0.0.1:7000          0.0.0.0:*               LISTEN      
18688/java      
tcp        0      0 127.0.0.1:54555         0.0.0.0:*               LISTEN      
18688/java      
tcp        0      0 127.0.0.1:7100          0.0.0.0:*               LISTEN      
18688/java      
tcp6       0      0 127.0.0.1:9042          :::*                    LISTEN      
18688/java      
tcp6       0      0 :::22                   :::*                    LISTEN      
-               
{code}

Looking over it with [~mshuler] and [~philipthompson], we thought it was weird 
that the CQL port ({{:9042}}) was listed as using IPv6 even though the other 
ports (e.g. the thrift port) use v4. It also seems weird that it displays CQL 
as using {{127.0.0.1}} rather than {{:::}} for the local address.

I'm running the test again here:

http://cassci.datastax.com/job/mambocab-cassandra-3.0-dtest/4/console

I've changed the debugging logic to see if it looks like this on successful 
tests as well.

> periodic timeout errors in dtest
> --------------------------------
>
>                 Key: CASSANDRA-10730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>            Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1': 
> OperationTimedOut('errors=Timed out creating connection (10 seconds), 
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing 
> this way has increased (it feels like it has). From there we can bisect over 
> the dtests, ccm, or C*, depending on what looks like the source of the 
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes 
> start but don't successfully make the CQL port available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to