[
https://issues.apache.org/jira/browse/CASSANDRA-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494099#comment-17494099
]
Bernardo Botella Corbi edited comment on CASSANDRA-17335 at 2/17/22, 5:30 PM:
------------------------------------------------------------------------------
Found a fix for this issue. Can be found here:
[https://github.com/apache/cassandra/pull/1452/files]
As a backup, I am also attaching a patch.
[^0001-Fix-Flaky-testNoSuchRepairSessionAnticompaction-trunk.patch]
Problem is that sometimes the session state value changed to FAILED state
between the moment it was checked as non failed and the moment it was being
updated, leading to an ilegal transition (from FAILED to PREPARED).
The way I was able to repro it was:
* to tell intelliJ to keep running the same test until it failed. It failed
always after between 2 and 6 runs.
* Running a simple bash script until it failed. Same results.
That allowed me to investigate the logs and chase that. After adding the
synchronize, I couldn’t make the test fail again. Also, thinking about it, it
makes perfect sense to put that block into a synchronized block, as it is
reading a variable that is being updated from other threads.
was (Author: JIRAUSER285406):
Found a fix for this issue. Can be found here:
[https://github.com/apache/cassandra/compare/trunk...bbotella:CASSANDRA-17335-trunk]
As a backup, I am also attaching a patch.
[^0001-Fix-Flaky-testNoSuchRepairSessionAnticompaction-trunk.patch]
Problem is that sometimes the session state value changed to FAILED state
between the moment it was checked as non failed and the moment it was being
updated, leading to an ilegal transition (from FAILED to PREPARED).
The way I was able to repro it was:
* to tell intelliJ to keep running the same test until it failed. It failed
always after between 2 and 6 runs.
* Running a simple bash script until it failed. Same results.
That allowed me to investigate the logs and chase that. After adding the
synchronize, I couldn’t make the test fail again. Also, thinking about it, it
makes perfect sense to put that block into a synchronized block, as it is
reading a variable that is being updated from other threads.
> Flaky testNoSuchRepairSessionAnticompaction
> -------------------------------------------
>
> Key: CASSANDRA-17335
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17335
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Assignee: Bernardo Botella Corbi
> Priority: Normal
> Attachments:
> 0001-Fix-Flaky-testNoSuchRepairSessionAnticompaction-trunk.patch
>
>
> The in-JVM dtest {{RepairErrorsTest#testNoSuchRepairSessionAnticompaction}}
> seems to be flaky, as it's shown by [this repeated
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1280/workflows/8a4e04cb-64cc-46a3-9e1e-c946dfafc7fa/jobs/12114]
> on trunk, which hits 18 failures in 500 iterations. The config for CircleCI
> was generated with:
> {code}
> .circleci/generate.sh -m \
> -e REPEATED_UTEST_TARGET=test-jvm-dtest-some \
> -e REPEATED_UTEST_COUNT=500 \
> -e
> REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.RepairErrorsTest
> {code}
> This was discovered while testing CASSANDRA-16878, on [this CI
> run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1268/workflows/aef1c703-c816-40f8-8e07-9055027d6403/jobs/12000].
> The error consists on a failed assertion when grepping the logs in search of
> an error message.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]