I am trying to understand the best procedure for adding new nodes. The one that I see most often online seems to have a hole where there is a low probability of permanently losing data. I want to understand what I am missing in my understanding.
Let's say I have a 3 node cluster (node A,B,C) with a RF of 3. I want to double the cluster size to 6 (node A,B,C,D,E,F) while keeping the replication factor of 3. Let's assume we use vnodes. My understanding is to bootstrap the 3 nodes and then run repair then cleanup. Here is my failure case: Before bootstrapping I have a row that is only replicated onto node A and B. Assume I did a quorum write and there was some hiccup on C, hinted handoff didn't work, and a repair has not yet been run. Let's also assume that once nodes D,E, F have been bootstrapped, this rows new replicas are D,E, and F. My reading through the bootstrapping code shows that for a given range, it streams it only from one node (unlike repair). There is a 1/9 chance that D,E,F will have streamed the range containing the row from C, which does not have this row. Now not even a consistency level read of ALL will return the row. A repair will not solve it, and when cleanup is run, the row is permanently deleted. I don't think this problem would normally happen without vnodes, because when doubling you would alternate the new nodes with the old nodes in the ring, so while quorum might not work until the final repair, "all" would, and a repair would solve the problem. With vnodes though, some of the ranges will follow the pattern above (range ownership moving from A,B,C to D,E,F). Am I missing something here? If I'm right, I think the only way to avoid this is adding less then a quorum of new nodes (in this case 1) before doing a repair. That would be painful since repairs take a while. Thanks for any help. Eddie