I am trying to understand the best procedure for adding new nodes.  The one 
that I see most often online seems to have a hole where there is a low 
probability of permanently losing data.  I want to understand what I am missing 
in my understanding.

Let's say I have a 3 node cluster (node A,B,C) with a RF of 3.  I want to 
double the cluster size to 6 (node A,B,C,D,E,F) while keeping the replication 
factor of 3.  Let's assume we use vnodes.

My understanding is to bootstrap the 3 nodes and then run repair then cleanup.  
Here is my failure case:

Before bootstrapping I have a row that is only replicated onto node A and B.  
Assume I did a quorum write and there was some hiccup on C, hinted handoff 
didn't work, and a repair has not yet been run.  Let's also assume that once 
nodes D,E, F have been bootstrapped, this rows new replicas are D,E, and F.

My reading through the bootstrapping code shows that for a given range, it 
streams it only from one node (unlike repair).  There is a 1/9 chance that 
D,E,F will have streamed the range containing the row from C, which does not 
have this row.

Now not even a consistency level read of ALL will return the row.  A repair 
will not solve it, and when cleanup is run, the row is permanently deleted.

I don't think this problem would normally happen without vnodes, because when 
doubling you would alternate the new nodes with the old nodes in the ring, so 
while quorum might not work until the final repair, "all" would, and a repair 
would solve the problem.  With vnodes though, some of the ranges will follow 
the pattern above (range ownership moving from A,B,C to D,E,F).

Am I missing something here?  If I'm right, I think the only way to avoid this 
is adding less then a quorum of new nodes (in this case 1) before doing a 
repair.  That would be painful since repairs take a while.


Thanks for any help.

Eddie

Reply via email to