[
https://issues.apache.org/jira/browse/CASSANDRA-17461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andres de la Peña updated CASSANDRA-17461:
------------------------------------------
Description:
Intermittent failures on {{org.apache.cassandra.distributed.test.CASTest}} for
trunk:
*
[testConflictingWritesWithStaleRingInformation|https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation_3/]
*
[testSuccessfulWriteBeforeRangeMovement|https://ci-cassandra.apache.org/job/Cassandra-trunk/1025/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteBeforeRangeMovement/]
*
[testSuccessfulWriteDuringRangeMovementFollowedByConflicting|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteDuringRangeMovementFollowedByConflicting/]
*
[testSucccessfulWriteDuringRangeMovementFollowedByRead|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSucccessfulWriteDuringRangeMovementFollowedByRead/]
All four seem to have the same aspect:
{code}
Failed 2 times in the last 5 runs. Flakiness: 50%, Stability: 60%
Error Message
CAS operation timed out: received 1 of 2 required responses after 0 contention
retries
Stacktrace
org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
out: received 1 of 2 required responses after 0 contention retries
at
org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
at
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
at
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
at
org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
at
org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
at
org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Standard Output
DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan these
urls:
[jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
file:/home/cassandra/cassandra/build/classes/fqltool/,
file:/home/cassandra/cassandra/build/test/classes/,
file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
...[truncated 4929659 chars]...
gService.java:519 - Waiting for messaging service to quiesce
INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1 2022-03-19
16:21:55,221 MessagingService.java:519 - Waiting for messaging service to
quiesce
INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
quiesce
{code}
Failures can also be repeatedly hit with CircleCI test multiplexer:
[https://app.circleci.com/pipelines/github/adelapena/cassandra/1394/workflows/8d40d44b-7ccb-40fe-82d5-37db0bb228a3].
The same test looks ok in 4.0, as suggested by Butler and [this repeated Circle
run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1395/workflows/5669dd1e-1a4c-4801-b1a1-c3ca04a29e2b].
was:
Intermittent failure on
{{org.apache.cassandra.distributed.test.CASTest#testConflictingWritesWithStaleRingInformation}}
for trunk:
[https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation/]
{code:java}
Failed 1 times in the last 5 runs. Flakiness: 25%, Stability: 80%
Error Message
CAS operation timed out: received 1 of 2 required responses after 0 contention
retries
Stacktrace
org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
out: received 1 of 2 required responses after 0 contention retries
at
org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
at
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
at
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
at
org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
at
org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
at
org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Standard Output
DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan these
urls:
[jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
file:/home/cassandra/cassandra/build/classes/fqltool/,
file:/home/cassandra/cassandra/build/test/classes/,
file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
...[truncated 4929659 chars]...
gService.java:519 - Waiting for messaging service to quiesce
INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1 2022-03-19
16:21:55,221 MessagingService.java:519 - Waiting for messaging service to
quiesce
INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
quiesce
{code}
Failures can also be hit with CircleCI test multiplexer:
[https://app.circleci.com/pipelines/github/adelapena/cassandra/1389/workflows/0afa9d7a-30ad-44de-af8f-6e75d03c11b2]
CircleCI results show failures with a ~2% flakiness when running the test
isolated.
The same test looks ok in 4.0, as suggested by Butler and [this repeated Circle
run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1390/workflows/692e5cae-30f3-4cf9-84ac-5560bbcdcee8].
> Test Failure:
> org.apache.cassandra.distributed.test.CASTest.testConflictingWritesWithStaleRingInformation
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17461
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Priority: Normal
> Fix For: 4.x
>
>
> Intermittent failures on {{org.apache.cassandra.distributed.test.CASTest}}
> for trunk:
> *
> [testConflictingWritesWithStaleRingInformation|https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation_3/]
> *
> [testSuccessfulWriteBeforeRangeMovement|https://ci-cassandra.apache.org/job/Cassandra-trunk/1025/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteBeforeRangeMovement/]
> *
> [testSuccessfulWriteDuringRangeMovementFollowedByConflicting|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteDuringRangeMovementFollowedByConflicting/]
> *
> [testSucccessfulWriteDuringRangeMovementFollowedByRead|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSucccessfulWriteDuringRangeMovementFollowedByRead/]
> All four seem to have the same aspect:
> {code}
> Failed 2 times in the last 5 runs. Flakiness: 50%, Stability: 60%
> Error Message
> CAS operation timed out: received 1 of 2 required responses after 0
> contention retries
> Stacktrace
> org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
> out: received 1 of 2 required responses after 0 contention retries
> at
> org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
> at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Standard Output
> DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan
> these urls:
> [jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
> file:/home/cassandra/cassandra/build/classes/fqltool/,
> file:/home/cassandra/cassandra/build/test/classes/,
> file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
> ...[truncated 4929659 chars]...
> gService.java:519 - Waiting for messaging service to quiesce
> INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1
> 2022-03-19 16:21:55,221 MessagingService.java:519 - Waiting for messaging
> service to quiesce
> INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
> 16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
> quiesce
> {code}
> Failures can also be repeatedly hit with CircleCI test multiplexer:
> [https://app.circleci.com/pipelines/github/adelapena/cassandra/1394/workflows/8d40d44b-7ccb-40fe-82d5-37db0bb228a3].
> The same test looks ok in 4.0, as suggested by Butler and [this repeated
> Circle
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1395/workflows/5669dd1e-1a4c-4801-b1a1-c3ca04a29e2b].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]