[jira] [Updated] (CASSANDRA-11688) Replace_address should sanity check prior node state before migrating tokens

Jonathan Shook (JIRA) Fri, 29 Apr 2016 14:00:27 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Shook updated CASSANDRA-11688:
---------------------------------------
    Description: 
During a node replacement, a replace_address was used which was associated with 
a different node than the intended one. The result was that both nodes remained 
active after the node came up. This caused several other issues which were 
difficult to diagnose, including invalid gossip state, etc.

Replace_address should be more robust in this scenario. It would be much more 
user friendly if the replace_address logic would first do some basic sanity 
checks, possibly to include:

- Pinging the other node to see if it is indeed “down”, if the address is 
different than all local interface addresses
- Checking gossip state of the node to verify that it is not known to peers.

It may even be safest to require that both address reachability and gossip 
state are required to show the replace_address as down by default before 
allowing any token migration or other replace_address actions to occur.

In the case that the replace_address is not ready to be replaced, the log 
should indicate that you are trying to replace an active node, and cassandra 
should refuse to start.

  was:
During a node replacement, a customer used an ip address associated with a 
different node than the intended one. The result was that both nodes remained 
active after the node came up. This caused several other issues which were 
difficult to diagnose, including invalid gossip state, etc.

Replace_address should be more robust in this scenario. It would be much more 
user friendly if the replace_address logic would first do some basic sanity 
checks, possibly to include:

- Pinging the other node to see if it is indeed “down”, if the address is 
different than all local interface addresses
- Checking gossip state of the node to verify that it is not known to peers.

It may even be safest to require that both address reachability and gossip 
state are required to show the replace_address as down by default before 
allowing any token migration or other replace_address actions to occur.

In the case that the replace_address is not ready to be replaced, the log 
should indicate that you are trying to replace an active node, and cassandra 
should refuse to start.


> Replace_address should sanity check prior node state before migrating tokens
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11688
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Shook
>
> During a node replacement, a replace_address was used which was associated 
> with a different node than the intended one. The result was that both nodes 
> remained active after the node came up. This caused several other issues 
> which were difficult to diagnose, including invalid gossip state, etc.
> Replace_address should be more robust in this scenario. It would be much more 
> user friendly if the replace_address logic would first do some basic sanity 
> checks, possibly to include:
> - Pinging the other node to see if it is indeed “down”, if the address is 
> different than all local interface addresses
> - Checking gossip state of the node to verify that it is not known to peers.
> It may even be safest to require that both address reachability and gossip 
> state are required to show the replace_address as down by default before 
> allowing any token migration or other replace_address actions to occur.
> In the case that the replace_address is not ready to be replaced, the log 
> should indicate that you are trying to replace an active node, and cassandra 
> should refuse to start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11688) Replace_address should sanity check prior node state before migrating tokens

Reply via email to