[jira] [Commented] (CASSANDRA-15650) Fix flaky test org.apache.cassandra.distributed.test.*RepairCoordinatorFastTest

David Capwell (Jira) Wed, 25 Mar 2020 09:53:19 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066854#comment-17066854
 ]


David Capwell commented on CASSANDRA-15650:
-------------------------------------------

To also help with the review, a few things changed but only a hand full solve 
the problem.

1) there is a bug in nodetool repair (client side) where the error notification 
isn't seen (expected behavior, notifications are lossy) but the complete is.  
In this case the return code is 0 though the message will say there was a 
failure.  I changed it so we double check for errors.
2) the slow tests take 6m10s on my laptop with only 2 cores running; the dtest 
timeout is 6m.  I split the tests into 2 (which causes 3 new implementations: 
preview, IR, full) which now gets it to 4m30s under the same settings.

The other changes are
1) better assert messages, useful to see issue #1, but generally offer more 
detail than before
2) thread renaming, this is a operational improvement as it better allows us to 
see which threads are running which repair.  This ties in well with 
improvements to #2 as it will show which repair is timed out rather than "there 
exists a repair"

> Fix flaky test 
> org.apache.cassandra.distributed.test.*RepairCoordinatorFastTest
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15650
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15650
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0-alpha
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Test failure: 
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/177/workflows/3dff37a5-9bf4-40e2-8d5b-f127b416dc79/jobs/862
> {code}
> [junit-timeout] Testcase: 
> onlyCoordinator[SEQUENTIAL/true](org.apache.cassandra.distributed.test.FullRepairCoordinatorFastTest):
>       FAILED
> [junit-timeout] nodetool command repair was successful but not expected to 
> be. Actual: 0
> [junit-timeout] junit.framework.AssertionFailedError: nodetool command repair 
> was successful but not expected to be. Actual: 0
> [junit-timeout]       at 
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.failure(NodeToolResult.java:76)
> [junit-timeout]       at 
> org.apache.cassandra.distributed.test.RepairCoordinatorFast.onlyCoordinator(RepairCoordinatorFast.java:255)
> [junit-timeout]       at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [junit-timeout]       at java.lang.Thread.run(Thread.java:748)
> {code}
> [Circle CI 
> LOWER|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorTestFlaky]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15650) Fix flaky test org.apache.cassandra.distributed.test.*RepairCoordinatorFastTest

Reply via email to