[
https://issues.apache.org/jira/browse/CASSANDRA-17461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572968#comment-17572968
]
Benedict Elliott Smith edited comment on CASSANDRA-17461 at 7/29/22 12:57 PM:
------------------------------------------------------------------------------
Hmm, is there an issue with gossip? Have the defaults for gossip been changed
in in-jvm dtests? There shouldn't be gossip running, but there seem to be
background changes in gossip state for the nodes (that span messages being
sent).
{noformat}
[junit-timeout] INFO [MutationStage-1] <main> 2022-07-29 12:19:44,849
CASTestBase.java:124 - Dropping PAXOS2_PREPARE_REQ from 1 to 3
[junit-timeout] INFO [node1_GossipStage:1] node1 2022-07-29 12:19:44,896
Gossiper.java:1409 - Node /127.0.0.4:7012 is now part of the cluster
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,897
StorageService.java:2887 - Node /127.0.0.4:7012 state NORMAL, token
[9223372036854775801]
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,901
StorageService.java:2797 - New node /127.0.0.4:7012 at token 9223372036854775801
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,914
Gossiper.java:1353 - removing expire time for endpoint : /127.0.0.4:7012
[junit-timeout] INFO [node1_GossipStage:1] node1 2022-07-29 12:19:44,914
Gossiper.java:1354 - InetAddress /127.0.0.4:7012 is now UP
[junit-timeout] INFO [MutationStage-1] <main> 2022-07-29 12:19:44,915
CASTestBase.java:124 - Dropping PAXOS2_PREPARE_REQ from 1 to 3
[junit-timeout] INFO [ReadStage-1] <main> 2022-07-29 12:19:45,536
CASTestBase.java:124 - Dropping READ_REQ from 1 to 3 {noformat}
was (Author: benedict):
Hmm, is there an issue with gossip? Have the defaults for gossip been changed
in in-jvm dtests? There shouldn't be gossip running, but there seem to be
background changes in gossip state for the nodes.
{noformat}
[junit-timeout] INFO [MutationStage-1] <main> 2022-07-29 12:19:44,849
CASTestBase.java:124 - Dropping PAXOS2_PREPARE_REQ from 1 to 3
[junit-timeout] INFO [node1_GossipStage:1] node1 2022-07-29 12:19:44,896
Gossiper.java:1409 - Node /127.0.0.4:7012 is now part of the cluster
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,897
StorageService.java:2887 - Node /127.0.0.4:7012 state NORMAL, token
[9223372036854775801]
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,901
StorageService.java:2797 - New node /127.0.0.4:7012 at token 9223372036854775801
[junit-timeout] DEBUG [node1_GossipStage:1] node1 2022-07-29 12:19:44,914
Gossiper.java:1353 - removing expire time for endpoint : /127.0.0.4:7012
[junit-timeout] INFO [node1_GossipStage:1] node1 2022-07-29 12:19:44,914
Gossiper.java:1354 - InetAddress /127.0.0.4:7012 is now UP
[junit-timeout] INFO [MutationStage-1] <main> 2022-07-29 12:19:44,915
CASTestBase.java:124 - Dropping PAXOS2_PREPARE_REQ from 1 to 3
[junit-timeout] INFO [ReadStage-1] <main> 2022-07-29 12:19:45,536
CASTestBase.java:124 - Dropping READ_REQ from 1 to 3 {noformat}
> Test Failure:
> org.apache.cassandra.distributed.test.CASTest.testConflictingWritesWithStaleRingInformation
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17461
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Priority: Normal
> Fix For: 4.1-beta, 4.x
>
>
> Intermittent failures on {{org.apache.cassandra.distributed.test.CASTest}}
> for trunk:
> *
> [testConflictingWritesWithStaleRingInformation|https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation_3/]
> *
> [testSuccessfulWriteBeforeRangeMovement|https://ci-cassandra.apache.org/job/Cassandra-trunk/1025/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteBeforeRangeMovement/]
> *
> [testSuccessfulWriteDuringRangeMovementFollowedByConflicting|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteDuringRangeMovementFollowedByConflicting/]
> *
> [testSucccessfulWriteDuringRangeMovementFollowedByRead|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSucccessfulWriteDuringRangeMovementFollowedByRead/]
> All four seem to have the same aspect:
> {code}
> Failed 2 times in the last 5 runs. Flakiness: 50%, Stability: 60%
> Error Message
> CAS operation timed out: received 1 of 2 required responses after 0
> contention retries
> Stacktrace
> org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
> out: received 1 of 2 required responses after 0 contention retries
> at
> org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
> at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Standard Output
> DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan
> these urls:
> [jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
> file:/home/cassandra/cassandra/build/classes/fqltool/,
> file:/home/cassandra/cassandra/build/test/classes/,
> file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
> ...[truncated 4929659 chars]...
> gService.java:519 - Waiting for messaging service to quiesce
> INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1
> 2022-03-19 16:21:55,221 MessagingService.java:519 - Waiting for messaging
> service to quiesce
> INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
> 16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
> quiesce
> {code}
> Failures can also be repeatedly hit with CircleCI test multiplexer:
> [https://app.circleci.com/pipelines/github/adelapena/cassandra/1394/workflows/8d40d44b-7ccb-40fe-82d5-37db0bb228a3].
> The same test looks ok in 4.0, as suggested by Butler and [this repeated
> Circle
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1395/workflows/5669dd1e-1a4c-4801-b1a1-c3ca04a29e2b].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]