[ 
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046981#comment-15046981
 ] 

Jim Witschey commented on CASSANDRA-10730:
------------------------------------------

I was misinterpreting the output; I haven't found evidence yet that the TCP 
connection is responding before a connection fails.

Still, writing garbage over TCP still works immediately after a failed 
connection, and in previous debugging runs, {{netstat}} found a CQL port 
listening before failed connections. (Sorry I don't have that printing again; I 
was running {{netstat}} but not printing the results. My mistake.) I do think a 
once-over by a driver expert could be helpful.

The results of this test:

http://cassci.datastax.com/view/Dev/view/mambocab/job/mambocab-cassandra-3.0-dtest/39/testReport/cql_tests/MiscellaneousCQLTester/cql3_insert_thrift_test/

may be easier to interpret since there's only one node. I've collected the 
node's log, the node's debug log, and the test's stdout here:

https://gist.github.com/mambocab/70a4470f519a85c2c542

[~aweisberg] Could you have a look at those logs to see if there's anything I 
should dig into there?

[~aholmber] Do you have a minute to look at this, or know of another Python 
driver expert who could have a look? If y'all have any insights about this, 
that'd be helpful, but we're hoping at least for some direction on next 
debugging steps. Thanks.

> periodic timeout errors in dtest
> --------------------------------
>
>                 Key: CASSANDRA-10730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>            Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1': 
> OperationTimedOut('errors=Timed out creating connection (10 seconds), 
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing 
> this way has increased (it feels like it has). From there we can bisect over 
> the dtests, ccm, or C*, depending on what looks like the source of the 
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes 
> start but don't successfully make the CQL port available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to