[ 
https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297360#comment-15297360
 ] 

Paulo Motta commented on CASSANDRA-11848:
-----------------------------------------

Reproduced this with a [simple replace_address 
dtest|https://github.com/pauloricardomg/cassandra-dtest/blob/f2b023ac8da68b31288221f95d21bed235d93ba4/replace_address_test.py#L460].
 Also [added bootstrap 
dtests|https://github.com/pauloricardomg/cassandra-dtest/blob/f2b023ac8da68b31288221f95d21bed235d93ba4/bootstrap_test.py#L180]
 to verify that bootstrap fails if any replica is down when 
{{cassandra.consistent.rangemovement=true}} or if more than RF replicas are 
down and {{cassandra.consistent.rangemovement=false}}.

What happens is that {{replace_address}} node does not consider itself a 
pending endpoint, but instead replaces the old node with itself on 
{{TokenMetadata}}, so it considers itself a valid source on 
{{RangeStreamer.getRangeFetchMap}}, even though it only stream from other 
replicas. In practice, this means the replacing node only stream from alive 
replicas and silently ignore down replicas (even if all other replicas are 
down).

Considering the local a node a valid source was added on CASSANDRA-4200 since 
it's a valid scenario during single-node moves. While CASSANDRA-8523 should fix 
this by making replace go through the normal bootstrap path, the simple fix for 
now is to not consider the local node a valid source during 
bootstraps/replaces. This does not affect CASSANDRA-4200 dtest 
{{topology_test.py:TestTopology.move_single_node_test}}.

Patch and tests below:
||2.1||2.2||3.0||3.7||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11848]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.7...pauloricardomg:3.7-11848]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11848]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:11848]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.7-11848-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11848-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.7-11848-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11848-dtest/lastCompletedBuild/testReport/]|

For some reason I'm not able to submit tests to cassCI. I will try again later 
and report back here when tests are available.

> replace address can "succeed" without actually streaming anything
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-11848
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11848
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremiah Jordan
>            Assignee: Paulo Motta
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> When you do a replace address and the new node has the same IP as the node it 
> is replacing, then the following check can let the replace be successful even 
> if we think all the other nodes are down: 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271
> As the FailureDetectorSourceFilter will exclude the other nodes, so an empty 
> stream plan gets executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to