[
https://issues.apache.org/jira/browse/CASSANDRA-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423955#comment-17423955
]
Andres de la Peña commented on CASSANDRA-16953:
-----------------------------------------------
The failure can be easily reproduced with the multiplexer. For example, [this
run|https://app.circleci.com/pipelines/github/adelapena/cassandra/954/workflows/abc53f58-5585-4e85-8c24-822fb03b9d98]
with 100 repetitions reproduces the failure 26 times.
The error happens when trying to concurrently shutdown the instances in
{{AbstractCluster#close()}}. Any of the two first nodes can get a rejected
connection when trying to connect to the third node, which is the one that
wasn't able to do the replacement and is kept running.
The test apparently
[passes|https://app.circleci.com/pipelines/github/adelapena/cassandra/959/workflows/6e8c2754-6c44-43a7-8190-8cce5b6ec604]
if we shut down the third node before shutting down the other two instances,
[this
way|https://github.com/adelapena/cassandra/commit/f730853db1cb557676ae46b248030677b895a91f],
but I'm not sure about what is interfering with the parallel shutdown.
> Flaky replaceAliveHost test from hostreplacement.HostReplacementTest
> --------------------------------------------------------------------
>
> Key: CASSANDRA-16953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16953
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Ruslan Fomkin
> Assignee: Andres de la Peña
> Priority: Normal
>
> {{replaceAliveHost}} from
> {{org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest}}
> has failed number of times in different CircleCI builds in Java 8 and in Java
> 11. [The last
> failure|https://app.circleci.com/pipelines/github/k-rus/cassandra/14/workflows/3af46462-d162-4997-a49e-1ca10cd2392b/jobs/126/tests#failed-test-0].
> The log is the same in different failures:
> {code:java}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
> at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:70)
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:476)
> at
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:850)
> at
> org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest.replaceAliveHost(HostReplacementTest.java:145)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Caused by: java.util.concurrent.TimeoutException
> at
> java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
> at
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:468)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]