[
https://issues.apache.org/jira/browse/CASSANDRA-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353032#comment-15353032
]
Paulo Motta commented on CASSANDRA-11414:
-----------------------------------------
Since this test kills streaming at random points, it was causing various errors
or race conditions causing the test to fail, so the basic idea here is to
improve synchronization to avoid these races when a node is randomly killed in
the middle of a streaming. With that said, I made the following improvements:
* 2.2+
** Add null protection on ConnectionHandler.signalCloseDone
** Stream session was not being failed on {{SocketException}}, what could cause
it to hang on broken connections
* 3.0+
** Synchronize access to transaction on {{StreamReceiveTask}}
** Abort {{SSTableWriter}} if received after {{StreamReceiveTask}} is finished
** Abort {{SSTableWriter}} if there's a failure during finalization on
{{StreamReceiveTask}}
** Synchronize access to {{StreamSession}} methods: {{prepareReceiving}},
{{addTransferFiles}} and {{addTransferRanges}}, so they don't race with
{{onError}}, since that will try to abort active tasks.
*** Throw exception if any of these are executed after stream session is
finished (added tests on {{StreamReceiveTask}}))
After these were addressed, the number of failures have gone down from 28/100
to 11/100 on this [multiplexer
job|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/149/].
The remaining failures are due to bad timing on dtest, so I [updated the
dtest|https://github.com/riptano/cassandra-dtest/pull/1051/commits/51ed5f55c85a3a1c339b265ac4b056137215e5fd]
to address those and submitted a new multiplexer run (still queued).
Patch and tests available below:
||2.2||3.0||3.9||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11414]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11414]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.9...pauloricardomg:3.9-11414]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11414]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11414]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11414-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11414-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-11414-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11414-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11414-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11414-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-11414-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11414-dtest/lastCompletedBuild/testReport/]|
Will set to PA once new multiplexer and CI run looks good.
> dtest failure in bootstrap_test.TestBootstrap.resumable_bootstrap_test
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-11414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11414
> Project: Cassandra
> Issue Type: Bug
> Components: Testing
> Reporter: Philip Thompson
> Assignee: Paulo Motta
> Labels: dtest
> Fix For: 3.x
>
>
> Stress is failing to read back all data. We can see this output from the
> stress read
> {code}
> java.io.IOException: Operation x0 on key(s) [314c384f304f4c325030]: Data
> returned was not validated
> at org.apache.cassandra.stress.Operation.error(Operation.java:138)
> at
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:116)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:101)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> java.io.IOException: Operation x0 on key(s) [33383438363931353131]: Data
> returned was not validated
> at org.apache.cassandra.stress.Operation.error(Operation.java:138)
> at
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:116)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:101)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> FAILURE
> {code}
> Started happening with build 1075. Does not appear flaky on CI.
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1076/testReport/bootstrap_test/TestBootstrap/resumable_bootstrap_test
> Failed on CassCI build trunk_dtest #1076
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)