Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

kurt greaves Thu, 15 Feb 2018 17:40:33 -0800

Ben did a talk
<https://www.youtube.com/watch?v=mMZBvPXAhzU&index=39&list=PLqcm6qE9lgKJkxYZUOIykswDndrOItnn2>
that might have some useful information. It's much more complicated with
vnodes though and I doubt you'll be able to get it to be as rapid as you'd
want.


sets up schema to match

This shouldn't be necessary. You'd just join the node as usual but with
auto_bootstrap: false and let the schema be propagated.

Is there an issue if the vnodes tokens for two nodes are identical? Do they
> have to be distinct for each node?

Yeah. This is annoying I know. The new node will take over the tokens of
the old node, which you don't want.


> Basically, I was wondering if we just use this to double the number of
> nodes with identical copies of the node data via snapshots, and then later
> on cassandra can pare down which nodes own which data.

There wouldn't be much point to adding nodes with the same (or almost the
same) tokens. That would just be shifting load. You'd essentially need a
very smart allocation algorithm to come up with good token ranges, but then
you still have the problem of tracking down the relevant SSTables from the
nodes. Basically, bootstrap does this for you ATM and only streams the
relevant sections of SSTables for the new node. If you were doing it from
backups/snapshots you'd need to either do the same thing (eek) or copy all
the SSTables from all the relevant nodes.

With single token nodes this becomes much easier. You can likely get away
with only copying around double/triple the data (depending on how you add
tokens to the ring and RF and node count).

I'll just put it out there that C* is a database and really isn't designed
to be rapidly scalable. If you're going to try, be prepared to invest A LOT
of time into it.

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

Reply via email to