Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

Carl Mueller Tue, 20 Feb 2018 09:08:46 -0800

Ok, so vnodes are random assignments under normal circumstances (I'm in
2.1.x, I'm assuming a derivative approach was in the works that would avoid
some hot node aspects of random primary range assingment for new nodes once
you had one or two or three in a cluster).


So... couldn't I just "engineer" the token assignments to be new primary
ranges that derive from the replica I am pulling the sstables from... say
take the primary range of the previous node's tokens and just take the
midpoint of it's range? If we stand up enough nodes, then the implicit hot
ranging going on here is mitigated.

The sstables can have data in them that is outside the primary range,
correct? Nodetool clean can get rid of data it doesn't need.

That leaves replicated ranges. Is there any means of replicated range
distribution that isn't the node with the next range slice in the overally
primary range? If we are splitting the old node's primary range, then the
replicas would travel with it and the new node would instantly become a
replica of the old node. the next primary ranges also have the replicas.


On Fri, Feb 16, 2018 at 3:58 PM, Carl Mueller <carl.muel...@smartthings.com>
wrote:

> Thanks. Yeah, it appears this would only be doable if we didn't have
> vnodes and used old single token clusters. I guess Priam has something
> where you increase the cluster by whole number multiples. Then there's the
> issue of doing quorum read/writes if there suddenly is a new replica range
> with grey-area ownership/responsiblity for the range, like where
> LOCAL_QUORUM becomes a bit illdefined if more than one node is being added
> to a cluster.
>
> I guess the only way that would work is if the nodes were some multiple of
> the vnode count and vnodes distributed themselves consistently, so that
> expansions of RF multiples might be consistent and precomputable for
> responsible ranges.
>
> I will read that talk.
>
> On Thu, Feb 15, 2018 at 7:39 PM, kurt greaves <k...@instaclustr.com>
> wrote:
>
>> Ben did a talk
>> <https://www.youtube.com/watch?v=mMZBvPXAhzU&index=39&list=PLqcm6qE9lgKJkxYZUOIykswDndrOItnn2>
>> that might have some useful information. It's much more complicated with
>> vnodes though and I doubt you'll be able to get it to be as rapid as you'd
>> want.
>>
>> sets up schema to match
>>
>> This shouldn't be necessary. You'd just join the node as usual but with
>> auto_bootstrap: false and let the schema be propagated.
>>
>> Is there an issue if the vnodes tokens for two nodes are identical? Do
>>> they have to be distinct for each node?
>>
>> Yeah. This is annoying I know. The new node will take over the tokens of
>> the old node, which you don't want.
>>
>>
>>> Basically, I was wondering if we just use this to double the number of
>>> nodes with identical copies of the node data via snapshots, and then later
>>> on cassandra can pare down which nodes own which data.
>>
>> There wouldn't be much point to adding nodes with the same (or almost the
>> same) tokens. That would just be shifting load. You'd essentially need a
>> very smart allocation algorithm to come up with good token ranges, but then
>> you still have the problem of tracking down the relevant SSTables from the
>> nodes. Basically, bootstrap does this for you ATM and only streams the
>> relevant sections of SSTables for the new node. If you were doing it from
>> backups/snapshots you'd need to either do the same thing (eek) or copy all
>> the SSTables from all the relevant nodes.
>>
>> With single token nodes this becomes much easier. You can likely get away
>> with only copying around double/triple the data (depending on how you add
>> tokens to the ring and RF and node count).
>>
>> I'll just put it out there that C* is a database and really isn't
>> designed to be rapidly scalable. If you're going to try, be prepared to
>> invest A LOT of time into it.
>> 
>>
>
>

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

Reply via email to