[ 
https://issues.apache.org/jira/browse/CASSANDRA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780016#action_12780016
 ] 

Jaakko Laine commented on CASSANDRA-562:
----------------------------------------

Ah, OK. I was thinking about a more automated solution where the nodes would 
resolve the situation automatically without manual intervention.

But now that we have 'move' and 'loadbalance' commands, without automatic 
conflict resolution, these are potentially problematic commands. Especially the 
latter, as the token is decided automatically, and already before move command. 
There is rather long time before it will appear as a valid token in metadata 
anywhere. A trigger-happy admin might easily bootstrap a number of nodes to 
same token as it stands. It would be nice to handle this cleanly behind the 
scenes if there is an easy solution to this.

Anyway, it seems I'm thinking about this from a more automated point of view 
where everything starting from virtual computing nodes are managed 
automatically. For such environment it would be essential that load balancing 
and node movement is able to resolve conflicts automatically, but since we're 
not going that way in 0.5, we might as well fix things that must be fixed and 
document rest for the admins with big warning letters :)


> Handle pending range clash gracefully
> -------------------------------------
>
>                 Key: CASSANDRA-562
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-562
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Jaakko Laine
>             Fix For: 0.5
>
>
> I think there are currently some problems with bootstrapping & leaving. The 
> inherent problem is that a node one-sidedly announces that it is going to 
> leave/take a token without making sure it will not cause conflict, and we do 
> not have proper mechanism to clean up after a conflict. There are currently 
> ways a simultaneous bootstrap or leaving can leave the cluster 
> (tokenmetadata) in an inconsistent state and we'll need either a mechanism to 
> resolve which operation wins or make sure only one operation is in process 
> for affected ranges at one time.
> 1st option (resolve & clean conflicts as they happen):
> We could add local timestamp to bootstrap/leave gossip and resolve conflicts 
> based on that. This would allow us to choose one of the operations 
> unambiguously and reject all but the one that was first. Theoretically, if 
> different data centers are not in perfect clock sync, this might always favor 
> one DC over the other in race situations, but this would hardly create any 
> noticeable bias. Problem is, this approach would probably end up being 
> horrendously complex (if not impossible) to do properly.
> 2nd option (make sure only one operation is in process at one time for 
> affected ranges):
> Add new messaging "channel" to be used to agree beforehand who is going to 
> move. (1) Before a node can start bootstrapping, it must ask permission from 
> the node whose range it is going to bootstrap to. The request will be 
> accepted if no other node is currently bootstrapping there. Nodes of course 
> check their own token metadata before bootstrapping, but this does not guard 
> against two nodes bootstrapping simultaneously (that is, before they see each 
> others' gossip). The only node able to answer this reliably is bootstrap 
> source. (2) For leaving, the node should first check with all nodes that are 
> going to have pending ranges if it is OK to leave. If it receives "OK" from 
> all of them, then it can leave. If bootstrapping or leaving operation is 
> rejected, the node will wait for random time and start over. Eventually all 
> nodes will be able to bootstrap.
> Good in this approach would be that with a relatively small change (adding 
> one messaging exchange before actual move) we could make sure that all 
> parties involved agree that the operation is OK. This is IMHO also the 
> "cleanest" way. Downside is that we will need token lease times (how long the 
> node owns "rights" to the token) and timers to make sure that we do not end 
> in a deadlock (or lock ranges) in case the node will not complete the 
> operation. Network partitions, delays and clock skews might create very 
> difficult border cases to handle.
> 3rd option (another approach to preventing conflicts from happening):
> (3) We might take a simplistic approach and add two new states: 
> ABOUT_TO_BOOTSTRAP and ABOUT_TO_LEAVE. Whenever a node wants to bootstrap or 
> leave, it will first gossip this state with token info and wait for some 
> time. After the wait, it will check if it is OK to carry on with the 
> operation (that is, it has not received any bootstrap/leaving gossip that 
> would contradict with its own plans). Also in this case, if the operation 
> cannot be done due to conflict, the node waits for random period and start 
> over.
> This would be easiest to implement, but it would not completely remove the 
> problem as network partitions and other delays might still cause two nodes to 
> clash without knowledge of each others' intentions. Also, adding two more 
> gossip states will expand the state machine and make it more fragile.
> Don't know if I'm thinking about this in a wrong way, but to me it seems 
> resolving conflicts is very difficult, so the only option is to avoid them by 
> some mechinism that makes sure only one node is moving within affected ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to