[
https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297360#comment-15297360
]
Paulo Motta commented on CASSANDRA-11848:
-----------------------------------------
Reproduced this with a [simple replace_address
dtest|https://github.com/pauloricardomg/cassandra-dtest/blob/f2b023ac8da68b31288221f95d21bed235d93ba4/replace_address_test.py#L460].
Also [added bootstrap
dtests|https://github.com/pauloricardomg/cassandra-dtest/blob/f2b023ac8da68b31288221f95d21bed235d93ba4/bootstrap_test.py#L180]
to verify that bootstrap fails if any replica is down when
{{cassandra.consistent.rangemovement=true}} or if more than RF replicas are
down and {{cassandra.consistent.rangemovement=false}}.
What happens is that {{replace_address}} node does not consider itself a
pending endpoint, but instead replaces the old node with itself on
{{TokenMetadata}}, so it considers itself a valid source on
{{RangeStreamer.getRangeFetchMap}}, even though it only stream from other
replicas. In practice, this means the replacing node only stream from alive
replicas and silently ignore down replicas (even if all other replicas are
down).
Considering the local a node a valid source was added on CASSANDRA-4200 since
it's a valid scenario during single-node moves. While CASSANDRA-8523 should fix
this by making replace go through the normal bootstrap path, the simple fix for
now is to not consider the local node a valid source during
bootstraps/replaces. This does not affect CASSANDRA-4200 dtest
{{topology_test.py:TestTopology.move_single_node_test}}.
Patch and tests below:
||2.1||2.2||3.0||3.7||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.7...pauloricardomg:3.7-11848]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11848]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11848]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.7-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11848-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.7-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11848-dtest/lastCompletedBuild/testReport/]|
For some reason I'm not able to submit tests to cassCI. I will try again later
and report back here when tests are available.
> replace address can "succeed" without actually streaming anything
> -----------------------------------------------------------------
>
> Key: CASSANDRA-11848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11848
> Project: Cassandra
> Issue Type: Bug
> Components: Streaming and Messaging
> Reporter: Jeremiah Jordan
> Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> When you do a replace address and the new node has the same IP as the node it
> is replacing, then the following check can let the replace be successful even
> if we think all the other nodes are down:
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271
> As the FailureDetectorSourceFilter will exclude the other nodes, so an empty
> stream plan gets executed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)