[
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-19344:
----------------------------------------
Authors: Marcus Eriksson, Sam Tunnicliffe (was: Sam
Tunnicliffe)
Test and Documentation Plan:
New and existing tests in attached CI results.
The few failures/errors in the attached results are known (CASSANDRA-19343) or
look to be infra-related.
Status: Patch Available (was: In Progress)
When an instance moves from a transient to a full replica for a given range, it
must begin acting as a full replica for writes before it does so for reads.
Otherwise, consistency can be violated as data streamed to the instance early
in the operation can be removed by cleanup if it occurs before the instance
assumes responsibility for full writes. Also, coordinators will route read
requests to instances which may not have received all preceding writes, causing
unnecessary read repair or potentially inconsistent results.
The root cause of the flaky test failure which originally produced this issue
was that in {{TransientRangeMovementTest::testRemoveNode}} cleanup happened to
run on one instance before it had enacted the final step of the removal
operation, leading it to remove more data than it should.
> Test Failure:
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17
> ----------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
> Issue Type: Bug
> Components: CI
> Reporter: Ekaterina Dimitrova
> Assignee: Sam Tunnicliffe
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20),
> (31,50)] at
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
> at
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here -
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode,
> 6d194555-f6eb-41d0-c000-000000000003, --force] was not successful stdout:
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use
> decommission command to remove it from the ring -- StackTrace --
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and
> owns this ID. Use decommission command to remove it from the ring at
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
> at
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
> at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
> at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
> at
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error:
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and
> owns this ID. Use decommission command to remove it from the ring at
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
> at
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
> at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
> at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
> at
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:833) at
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.fail(NodeToolResult.java:214)
> at
> org.apache.cassandra.distributed.api.NodeToolResult$Asserts.success(NodeToolResult.java:97)
> at
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:173)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here -
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2634/workflows/24617d26-e297-4857-bc43-b6a04e64a6ea/jobs/54534/tests#failed-test-0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]