Kurt Greaves commented on CASSANDRA-5836:

{quote}Hold on, how is this going to work at all? If the first node in new DC 
is going to bootstrap (let's assume seeds are allowed to bootstrap) it will own 
the whole token ring at first, so it will have to stream in all the data that 
exists in the source DC, times the RF(s) of new DC. Even if the new node 
doesn't die a horrible death in the process, you won't be able to add another 
node to the cluster until this is finished. And even after that, adding the 
next node to new DC will take ~50% of ownership from the first one, so you will 
need to run cleanup on the first one in the end, etc. for the rest of the new 

It is totally unpractical to add new DC this way, so I firmly believe that 
auto_bootstrap=false is here to stay for new DCs.

That will only occur if you've added the datacenter to replication prior to 
adding the nodes in the new DC. I was under the impression for a while that NTS 
keyspaces won't bootstrap across DC's, but on further testing they do. This is 
irrelevant however as it's not really standard practice to update your RF 
before you set up your new DC. You can, but it's generally not a good idea 
because of the reason you listed, unless you're doing it on a really small 
dataset. Either way it's for an expert to decide what they want to do here.

Regardless, there's no saying we need to change the [un]documented procedure 
for adding a new DC because of this. It's perfectly acceptable to still use 
{{auto_bootstrap: false}} for a new DC, even with my code changes. It would be 
worth documenting the current behaviour that you'll stream *a lot* of data if 
you update RF prior to bootstrapping a new DC though.

bq. ... I have doubts that any of these checks can be made really bullet-proof.
I was saying you'd need that check if you changed all seeds to bootstrap. 
Otherwise how would you tell if you are the first node? Currently a seed won't 
fail if you set {{auto_bootstrap: true}} and it's the first node in the 
cluster, which is what you're proposing.

bq. Well, we have this for adding new DCs, so not really that silly. It also 
doesn't have to be "we said so", for me the explanation is simple: the first 
seed node will fail the bootstrap otherwise, because there is no other nodes to 
bootstrap from yet.
I really don't think it's a good idea to change/further complicate the new 
cluster startup process. It's not the same as adding a DC (and as I've said, 
that procedure isn't necessarily correct). Many people will be relying on the 
existing behaviour pretty heavily. Complicating it by saying "now your first 
node will need auto_bootstrap: false" is not going to end well. How is it not 
simpler that the first node just doesn't bootstrap, and all others do?

bq. Again, I have serious doubts about all this automatic corner case 
detection. As I've said before I'm totally fine with making initial cluster set 
up a little bit more involved, if that makes operations on the clusters in 
production more reliable.
Cassandra is a complex beast. I've been in the startup code pretty heavily and 
I know first hand from working with hundreds of clusters that the 
startup/bootstrapping code is a nightmare. All I'm proposing is we change the 
"seeds don't bootstrap logic regardless of configuration" to "the first node in 
the cluster doesn't bootstrap, all other nodes respect the auto_bootstrap 
setting". IMO this reduces the # of corner cases because you no longer have to 
think about replacing nodes, new DC's, replacing nodes as seeds, new nodes as 
seeds, new nodes, or having conflicting configs like {{auto_bootstrap: true}} 
but being a seed.

> Seed nodes should be able to bootstrap without manual intervention
> ------------------------------------------------------------------
>                 Key: CASSANDRA-5836
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5836
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Bill Hathaway
>            Priority: Minor
> The current logic doesn't allow a seed node to be bootstrapped.  If a user 
> wants to bootstrap a node configured as a seed (for example to replace a seed 
> node via replace_token), they first need to remove the node's own IP from the 
> seed list, and then start the bootstrap process.  This seems like an 
> unnecessary step since a node never uses itself as a seed.
> I think it would be a better experience if the logic was changed to allow a 
> seed node to bootstrap without manual intervention when there are other seed 
> nodes up in a ring.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to