[
https://issues.apache.org/jira/browse/CASSANDRA-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068413#comment-15068413
]
Jim Witschey commented on CASSANDRA-10730:
------------------------------------------
We are bumping up the default instance size on CassCI to 2xlarge; we'll he how
that does to reduce timeout errors. Our short experiment showed big
improvements with that, so we'll see if it holds up.
Reasoning: I haven't made meaningful progress on this. We know more than we did
before, but not enough to make the timeouts stop. [~cdaw] brought up the point
that the way we run the dtests is sort of a degenerate case for C* operation:
we run multiple nodes on the same machine at the same time. I'm not 100%
confident that the cause of these issues is this kind of resource contention,
but it's a reasonable assumption and not really worth more of our time to try
to figure it out. We're not likely to find any problems that will help users if
we dig further, and bumping up the instance type will, we hope, fix the issue
it's causing for devs.
> periodic timeout errors in dtest
> --------------------------------
>
> Key: CASSANDRA-10730
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10730
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Jim Witschey
>
> Dtests often fail with connection timeout errors. For example:
> http://cassci.datastax.com/job/cassandra-3.1_dtest/lastCompletedBuild/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3/deletion_test/
> {code}
> ('Unable to connect to any servers', {'127.0.0.1':
> OperationTimedOut('errors=Timed out creating connection (10 seconds),
> last_host=None',)})
> {code}
> We've merged a PR to increase timeouts:
> https://github.com/riptano/cassandra-dtest/pull/663
> It doesn't look like this has improved things:
> http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/363/testReport/
> Next steps here are
> * to scrape Jenkins history to see if and how the number of tests failing
> this way has increased (it feels like it has). From there we can bisect over
> the dtests, ccm, or C*, depending on what looks like the source of the
> problem.
> * to better instrument the dtest/ccm/C* startup process to see why the nodes
> start but don't successfully make the CQL port available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)