[ 
https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090577#comment-13090577
 ] 

Nick Bailey commented on CASSANDRA-957:
---------------------------------------

So a few questions:

* In Gossiper.doStatusCheck() you made it ignore any state that is for the 
local endpoint and is not a dead state. Shouldn't it just always ignore any 
state about the local endpoint though? Basically what it was doing previously?
* Basically the same question about Gossiper.applyStateLocally() the loop 
continues if the state is for the local node and the state is dead. Why would 
we want to apply a live local state?
* Does the hibernate state need the true/false value? Seems like all we care 
about is that it is set at all. Looks like we we are starting up right now we 
automatically go into a hibernate state, then we go into a bootstrap state 
afterwards if the specified a replace token. Seems like we shouldn't set a 
state at all until we know we are doing one of replace/bootstrap/just joining.
* It looks like right now you could specify a replace token that isn't part of 
the cluster. If that happens we should throw an exception and tell the user to 
do the normal bootstrap process.
* Why use the last gossip time to determine if the node we are replacing is 
alive? Why not just check gossip to see if the ring thinks it is alive?
* We should update the the message for the exception that is thrown when you 
try to bootstrap to an existing token. It should indicate either remove the 
dead node or follow this replacement process.
* I'm not sure why we are calling updateNormalToken() in the 
StorageService.bootstrap() method when it's a token replacement.
* A little bit of doc on this would be good, maybe in cassandra.yaml? Just on 
how to pass the argument to the startup process.

I also need to dive into the hint stuff a little bit more, I'm less familiar 
with that code.


> convenience workflow for replacing dead node
> --------------------------------------------
>
>                 Key: CASSANDRA-957
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-957
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core, Tools
>    Affects Versions: 0.8.2
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.0
>
>         Attachments: 0001-Support-Token-Replace.patch, 
> 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 
> 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 
> 0002-Do-not-include-local-node-when-computing-workMap.patch, 
> 0002-Rework-Hints-to-be-on-token.patch, 
> 0002-Rework-Hints-to-be-on-token.patch, 
> 0002-upport-for-hints-on-token-v3.patch, 
> 0003-Make-HintedHandoff-More-reliable.patch, 
> 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Replacing a dead node with a new one is a common operation, but "nodetool 
> removetoken" followed by bootstrap is inefficient (re-replicating data first 
> to the remaining nodes, then to the new one) and manually bootstrapping to a 
> token "just less than" the old one's, followed by "nodetool removetoken" is 
> slightly painful and prone to manual errors.
> First question: how would you expose this in our tool ecosystem?  It needs to 
> be a startup-time option to the new node, so it can't be nodetool, and 
> messing with the config xml definitely takes the "convenience" out.  A 
> one-off -DreplaceToken=XXY argument?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to