[
https://issues.apache.org/jira/browse/CASSANDRA-17461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579160#comment-17579160
]
Ekaterina Dimitrova edited comment on CASSANDRA-17461 at 8/12/22 10:13 PM:
---------------------------------------------------------------------------
{quote}Unfortunately I have no idea how to run your repeated tests in CircleCI,
and cannot reproduce locally, so have been guided so far entirely by log
output. Perhaps you can explain to me how to use your tool, so I can run again
with TRACE output to see what might be happening in this timeout case, if it is
indeed different?
{quote}
In order to run the test in a loop, the job we pushed, you need to cherry-pick
[this
commit|https://github.com/adelapena/cassandra/commit/b026210655e8759e5f47d8eb072c0ae954ee3f52]
which was created by running this command:
{code:java}
.circleci/generate.sh -m \
-e REPEATED_UTEST_TARGET=test-jvm-dtest-some \
-e REPEATED_UTEST_COUNT=500 \
-e REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.CASTest \
-e REPEATED_UTEST_METHODS=testConflictingWritesWithStaleRingInformation{code}
Then when you push your branch and generate the workflows in CircleCI (as every
other patch), you go to either
_java_11_separate_tests_ or _java8_separate_tests_ workflow - depends on which
JDK you want to use at this point, and press first either _start_j8_build_ or
respectively _start_j11_build_
Then you press and approve the following job - _start_j11_repeated_utest_ or
again respectively _start_j8_repeated_utest_
This will run the test 500 times
Otherwise, I would like to mention there is a
[readme|https://github.com/adelapena/cassandra/blob/trunk/.circleci/readme.md]
in the CircleCI in-tree folder, also more info and examples how to run tests
from the different suites in a loop can be found in
[config-2_1.yml|https://github.com/adelapena/cassandra/blob/trunk/.circleci/config-2_1.yml#L47-L99]
These jobs are really useful and saved us a lot of time with flaky tests. I
would strongly recommend anyone to spend 15 minutes to read through things as I
am sure those are really super useful. Please let me know if you have any
questions or concerns. I will be happy to help
was (Author: e.dimitrova):
{quote}Unfortunately I have no idea how to run your repeated tests in CircleCI,
and cannot reproduce locally, so have been guided so far entirely by log
output. Perhaps you can explain to me how to use your tool, so I can run again
with TRACE output to see what might be happening in this timeout case, if it is
indeed different?
{quote}
In order to run the test in a loop, the job we pushed, you need to cherry-pick
[this
commit|https://github.com/adelapena/cassandra/commit/b026210655e8759e5f47d8eb072c0ae954ee3f52]
which was created by running this command:
.circleci/generate.sh -m \
-e REPEATED_UTEST_TARGET=test-jvm-dtest-some \
-e REPEATED_UTEST_COUNT=500 \
-e REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.CASTest \
-e REPEATED_UTEST_METHODS=testConflictingWritesWithStaleRingInformation
Then when you push your branch and generate the workflows in CircleCI (as every
other patch), you go to either
_java_11_separate_tests_ or _java8_separate_tests_ workflow - depends on which
JDK you want to use at this point, and press first either _start_j8_build_ or
respectively _start_j11_build_
Then you press and approve the following job - _start_j11_repeated_utest_ or
again respectively _start_j8_repeated_utest_
This will run the test 500 times
Otherwise, I would like to mention there is a
[readme|https://github.com/adelapena/cassandra/blob/trunk/.circleci/readme.md]
in the CircleCI in-tree folder, also more info and examples how to run tests
from the different suites in a loop can be found in
[config-2_1.yml|https://github.com/adelapena/cassandra/blob/trunk/.circleci/config-2_1.yml#L47-L99]
These jobs are really useful and saved us a lot of time with flaky tests. I
would strongly recommend anyone to spend 15 minutes to read through things as I
am sure those are really super useful. Please let me know if you have any
questions or concerns. I will be happy to help
> Test Failure:
> org.apache.cassandra.distributed.test.CASTest.testConflictingWritesWithStaleRingInformation
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17461
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Priority: Normal
> Fix For: 4.1-beta, 4.x
>
>
> Intermittent failures on {{org.apache.cassandra.distributed.test.CASTest}}
> for trunk:
> *
> [testConflictingWritesWithStaleRingInformation|https://ci-cassandra.apache.org/job/Cassandra-trunk/1024/testReport/org.apache.cassandra.distributed.test/CASTest/testConflictingWritesWithStaleRingInformation_3/]
> *
> [testSuccessfulWriteBeforeRangeMovement|https://ci-cassandra.apache.org/job/Cassandra-trunk/1025/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteBeforeRangeMovement/]
> *
> [testSuccessfulWriteDuringRangeMovementFollowedByConflicting|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSuccessfulWriteDuringRangeMovementFollowedByConflicting/]
> *
> [testSucccessfulWriteDuringRangeMovementFollowedByRead|https://ci-cassandra.apache.org/job/Cassandra-trunk/1020/testReport/org.apache.cassandra.distributed.test/CASTest/testSucccessfulWriteDuringRangeMovementFollowedByRead/]
> All four seem to have the same aspect:
> {code}
> Failed 2 times in the last 5 runs. Flakiness: 50%, Stability: 60%
> Error Message
> CAS operation timed out: received 1 of 2 required responses after 0
> contention retries
> Stacktrace
> org.apache.cassandra.exceptions.CasWriteTimeoutException: CAS operation timed
> out: received 1 of 2 required responses after 0 contention retries
> at
> org.apache.cassandra.service.paxos.Paxos$MaybeFailure.markAndThrowAsTimeoutOrFailure(Paxos.java:547)
> at org.apache.cassandra.service.paxos.Paxos.begin(Paxos.java:1048)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:659)
> at org.apache.cassandra.service.paxos.Paxos.cas(Paxos.java:618)
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:307)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:500)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:467)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Standard Output
> DEBUG [main] 2022-03-19 16:20:42,868 Reflections.java:198 - going to scan
> these urls:
> [jar:file:/home/cassandra/cassandra/build/apache-cassandra-4.1-SNAPSHOT.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/simulator-bootstrap.jar!/,
>
> jar:file:/home/cassandra/cassandra/build/test/lib/jars/dtest-api-0.0.12.jar!/,
> file:/home/cassandra/cassandra/build/classes/fqltool/,
> file:/home/cassandra/cassandra/build/test/classes/,
> file:/home/cassandra/cassandra/build/classes/main/, file:/home/cass
> ...[truncated 4929659 chars]...
> gService.java:519 - Waiting for messaging service to quiesce
> INFO [node1_isolatedExecutor:10] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node1_isolatedExecutor:10] node1
> 2022-03-19 16:21:55,221 MessagingService.java:519 - Waiting for messaging
> service to quiesce
> INFO [node2_isolatedExecutor:8] 2022-03-19 16:21:55,223
> SubstituteLogger.java:169 - INFO [node2_isolatedExecutor:8] node2 2022-03-19
> 16:21:55,222 MessagingService.java:519 - Waiting for messaging service to
> quiesce
> {code}
> Failures can also be repeatedly hit with CircleCI test multiplexer:
> [https://app.circleci.com/pipelines/github/adelapena/cassandra/1394/workflows/8d40d44b-7ccb-40fe-82d5-37db0bb228a3].
> The same test looks ok in 4.0, as suggested by Butler and [this repeated
> Circle
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1395/workflows/5669dd1e-1a4c-4801-b1a1-c3ca04a29e2b].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]