[
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108617#comment-13108617
]
Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------
bq. It's always been unsupported to bootstrap a second node into the same
"token arc" while a previous one is ongoing.
I'm pretty sure now that this is incorrect; we fixed it back in CASSANDRA-603.
I'm updating the comments in TokenMetadata as follows:
{noformat}
// Prior to CASSANDRA-603, we just had <tt>Map<Range, InetAddress>
pendingRanges<tt>,
// which was added to when a node began bootstrap and removed from when it
finished.
//
// This is inadequate when multiple changes are allowed simultaneously.
For example,
// suppose that there is a ring of nodes A, C and E, with replication
factor 3.
// Node D bootstraps between C and E, so its pending ranges will be E-A,
A-C and C-D.
// Now suppose node B bootstraps between A and C at the same time. Its
pending ranges
// would be C-E, E-A and A-B. Now both nodes need to be assigned pending
range E-A,
// which we would be unable to represent with the old Map. The same thing
happens
// even more obviously for any nodes that boot simultaneously between same
two nodes.
//
// So, we made two changes:
//
// First, we changed pendingRanges to a <tt>Multimap<Range,
InetAddress></tt> (now
// <tt>Map<String, Multimap<Range, InetAddress>></tt>, because replication
strategy
// and options are per-KeySpace).
//
// Second, we added the bootstrapTokens and leavingEndpoints collections,
so we can
// rebuild pendingRanges from the complete information of what is going on,
when
// additional changes are made mid-operation.
//
// Finally, note that recording the tokens of joining nodes in
bootstrapTokens also
// means we can detect and reject the addition of multiple nodes at the
same token
// before one becomes part of the ring.
private BiMap<Token, InetAddress> bootstrapTokens = HashBiMap.create();
// (don't need to record Token here since it's still part of
tokenToEndpointMap until it's done leaving)
private Set<InetAddress> leavingEndpoints = new HashSet<InetAddress>();
// this is a cache of the calculation from {tokenToEndpointMap,
bootstrapTokens, leavingEndpoints}
private ConcurrentMap<String, Multimap<Range, InetAddress>> pendingRanges =
new ConcurrentHashMap<String, Multimap<Range, InetAddress>>();
{noformat}
> range movements can violate consistency
> ---------------------------------------
>
> Key: CASSANDRA-2434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
> Project: Cassandra
> Issue Type: Bug
> Reporter: Peter Schuller
> Assignee: paul cannon
> Fix For: 1.0.1
>
> Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic
> involved during bootstrapping that avoids consistency level violations. If I
> recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3,
> and bootstrapping complete based on only one node, I have just violated the
> supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to
> look at which nodes are up and things about it carefully before
> bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an
> issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster
> which is otherwise used for QUORUM operations you may wish to accept
> less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the
> number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to
> bootstrap to a local node and then do a repair to avoid sending loads of data
> across DC:s while still achieving consistency. Or even if you don't care
> about the consistency issues, I don't think there is currently a way to
> bootstrap from local nodes only.)
> Thoughts?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira