[
https://issues.apache.org/jira/browse/CASSANDRA-17461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551590#comment-17551590
]
Benedict Elliott Smith commented on CASSANDRA-17461:
----------------------------------------------------
I don't think the test was materially changed, just refactored to share the
cluster and to use the new Verbs. AFAICT it is behaviourally identical. But
Paxos was changed, and this perhaps affected timings. I think perhaps we impose
request timeouts more accurately now, and this could impact flakiness.
The relevant timeout to change would not be the test timeout though, but the
contention and request timeouts, as these affect how quickly these request
timeouts would fire which might be relevant in flaky VM environments.
Not sure when I'll get a chance to look more closely myself.
> Test Failure:
> org.apache.cassandra.distributed.test.CASTest.testConflictingWritesWithStaleRingInformation
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17461
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Priority: Normal
> Fix For: 4.1-beta, 4.x
>
>
> Intermittent failures on {{org.apache.cassandra.distributed.test.CASTest}}
> for trunk:
> *
> [testConflictingWritesWithStaleRingInformation|https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation_3/]
> *
> [testSuccessfulWriteBeforeRangeMovement|https://ci-cassandra.apache.org/job/Cassandra-trunk/1025/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteBeforeRangeMovement/]
> *
> [testSuccessfulWriteDuringRangeMovementFollowedByConflicting|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteDuringRangeMovementFollowedByConflicting/]
> *
> [testSucccessfulWriteDuringRangeMovementFollowedByRead|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSucccessfulWriteDuringRangeMovementFollowedByRead/]
> All four seem to have the same aspect:
> {code}
> Failed 2 times in the last 5 runs. Flakiness: 50%, Stability: 60%
> Error Message
> CAS operation timed out: received 1 of 2 required responses after 0
> contention retries
> Stacktrace
> org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
> out: received 1 of 2 required responses after 0 contention retries
> at
> org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
> at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Standard Output
> DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan
> these urls:
> [jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
> file:/home/cassandra/cassandra/build/classes/fqltool/,
> file:/home/cassandra/cassandra/build/test/classes/,
> file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
> ...[truncated 4929659 chars]...
> gService.java:519 - Waiting for messaging service to quiesce
> INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1
> 2022-03-19 16:21:55,221 MessagingService.java:519 - Waiting for messaging
> service to quiesce
> INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
> 16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
> quiesce
> {code}
> Failures can also be repeatedly hit with CircleCI test multiplexer:
> [https://app.circleci.com/pipelines/github/adelapena/cassandra/1394/workflows/8d40d44b-7ccb-40fe-82d5-37db0bb228a3].
> The same test looks ok in 4.0, as suggested by Butler and [this repeated
> Circle
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1395/workflows/5669dd1e-1a4c-4801-b1a1-c3ca04a29e2b].
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]