[
https://issues.apache.org/jira/browse/CASSANDRA-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320548#comment-17320548
]
Adam Holmberg commented on CASSANDRA-16586:
-------------------------------------------
As shown in the error in the description, we're occasionally bumping into
coordinator timeouts, which are [set to a smaller
value|https://github.com/apache/cassandra/blob/6665fc29b33abcc26aad4cecbfee88225b0a7225/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeAvailabilityTestBase.java#L64-L65]
by the test. I assume they were set lower to make the test timeout faster when
we are [expecting a
failure|https://github.com/apache/cassandra/blob/6665fc29b33abcc26aad4cecbfee88225b0a7225/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeAvailabilityTestBase.java#L113].
However, simply raising them then causes the test to occasionally fail in a
different way ({{Failure}} instead of {{Timeout}}) if the node shutdown occurs
mid-request.
I'm pushing a potential patch that makes sure the coordinator sees the node as
down, then expects an {{UnavailableException}}. I'm not sure if/why the
{{Timeout}} exception would be essential to this test, but if that is preferred
we could look at another technique of inducing the timeout deterministically
without making it cause flakiness.
[patch|https://github.com/aholmberg/cassandra/pull/54]
[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-16586]
> Fix flaky test testAvailabilityV30ToV4 -
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> --------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16586
> Project: Cassandra
> Issue Type: Bug
> Components: CI
> Reporter: David Capwell
> Assignee: Adam Holmberg
> Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/881/workflows/8e477260-ac6a-4eab-b4be-cbc048199565/jobs/5269
> testAvailabilityV30ToV4 -
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> {code}
> junit.framework.AssertionFailedError: Unexpected error in case QUORUM-QUORUM
> with not upgraded coordinator and 1 nodes down
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:127)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$2(MixedModeAvailabilityTestBase.java:79)
> at
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:186)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:81)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:53)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test.testAvailabilityV30ToV4(MixedModeAvailabilityV30Test.java:39)
> Caused by: java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 1 responses.
> at
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:209)
> at
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$5(IsolatedExecutor.java:109)
> at
> org.apache.cassandra.distributed.impl.Coordinator.executeWithResult(Coordinator.java:69)
> at
> org.apache.cassandra.distributed.api.ICoordinator.execute(ICoordinator.java:32)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.lambda$test$1(MixedModeAvailabilityTestBase.java:120)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.maybeFail(MixedModeAvailabilityTestBase.java:139)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:119)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation
> timed out - received only 1 responses.
> at
> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:136)
> at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:142)
> at
> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
> at
> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1831)
> at
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1780)
> at
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1718)
> at
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1627)
> at
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1162)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:302)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:115)
> at
> org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:107)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:69)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]