[
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025096#comment-15025096
]
Jim Witschey commented on CASSANDRA-10730:
------------------------------------------
That's a good idea too. I'll try that after the current {{netstat}} job runs.
I ran a job running {{netstat}} after each unsuccessful connection attempt.
Here's an example of a connection timeout failure:
http://cassci.datastax.com/job/mambocab-cassandra-3.0-dtest/3/testReport/cql_tests/StorageProxyCQLTester/user_test/
Here's what the output looks like after each connection attempt:
{code}
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
-
tcp 0 0 127.0.0.1:7000 0.0.0.0:* LISTEN
18688/java
tcp 0 0 127.0.0.1:54555 0.0.0.0:* LISTEN
18688/java
tcp 0 0 127.0.0.1:7100 0.0.0.0:* LISTEN
18688/java
tcp6 0 0 127.0.0.1:9042 :::* LISTEN
18688/java
tcp6 0 0 :::22 :::* LISTEN
-
{code}
Looking over it with [~mshuler] and [~philipthompson], we thought it was weird
that the CQL port ({{:9042}}) was listed as using IPv6 even though the other
ports (e.g. the thrift port) use v4. It also seems weird that it displays CQL
as using {{127.0.0.1}} rather than {{:::}} for the local address.
I'm running the test again here:
http://cassci.datastax.com/job/mambocab-cassandra-3.0-dtest/4/console
I've changed the debugging logic to see if it looks like this on successful
tests as well.
> periodic timeout errors in dtest
> --------------------------------
>
> Key: CASSANDRA-10730
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1':
> OperationTimedOut('errors=Timed out creating connection (10 seconds),
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing
> this way has increased (it feels like it has). From there we can bisect over
> the dtests, ccm, or C*, depending on what looks like the source of the
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes
> start but don't successfully make the CQL port available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)