[
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046981#comment-15046981
]
Jim Witschey commented on CASSANDRA-10730:
------------------------------------------
I was misinterpreting the output; I haven't found evidence yet that the TCP
connection is responding before a connection fails.
Still, writing garbage over TCP still works immediately after a failed
connection, and in previous debugging runs, {{netstat}} found a CQL port
listening before failed connections. (Sorry I don't have that printing again; I
was running {{netstat}} but not printing the results. My mistake.) I do think a
once-over by a driver expert could be helpful.
The results of this test:
http://cassci.datastax.com/view/Dev/view/mambocab/job/mambocab-cassandra-3.0-dtest/39/testReport/cql_tests/MiscellaneousCQLTester/cql3_insert_thrift_test/
may be easier to interpret since there's only one node. I've collected the
node's log, the node's debug log, and the test's stdout here:
https://gist.github.com/mambocab/70a4470f519a85c2c542
[~aweisberg] Could you have a look at those logs to see if there's anything I
should dig into there?
[~aholmber] Do you have a minute to look at this, or know of another Python
driver expert who could have a look? If y'all have any insights about this,
that'd be helpful, but we're hoping at least for some direction on next
debugging steps. Thanks.
> periodic timeout errors in dtest
> --------------------------------
>
> Key: CASSANDRA-10730
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1':
> OperationTimedOut('errors=Timed out creating connection (10 seconds),
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing
> this way has increased (it feels like it has). From there we can bisect over
> the dtests, ccm, or C*, depending on what looks like the source of the
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes
> start but don't successfully make the CQL port available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)