RE: Configuration for new(expanding) cluster and new admins.

2022-06-16 Thread Durity, Sean R
I have run clusters with different disk size nodes by using different number of 
num_tokens. I used the basic math of just increasing the num_tokens by the same 
percentage as change in disk size. (So, if my "normal" node was 8 tokens, one 
with double the disk space would be 16.)

One thing to watch/consider - the (number of tokens) * (the number of nodes) 
makes repairs work harder


Sean R. Durity


INTERNAL USE

-Original Message-
From: Marc Hoppins  
Sent: Wednesday, June 15, 2022 3:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Configuration for new(expanding) cluster and new admins.

Hi all,

Say we have 2 datacentres with 12 nodes in each. All hardware is the same.

4-core, 2 x HDD (eg, 4TiB)

num_tokens = 16 as a start point

If a plan is to gradually increase the nodes per DC, and new hardware will have 
more of everything, especially storage, I assume I increase the num_tokens 
value.  Should I have started with a lower value?

What would be considered as a good adjustment for:

Any increase in number of HDD for any node?

Any increase in capacity per HDD for any node?

Is there any direct correlation between new token count and the proportional 
increase in either quantity of devices or total capacity, or is any adjustment 
purely arbitrary just to differentiate between varied nodes?

Thanks

M

RE: Configuration for new(expanding) cluster and new admins.

2022-06-16 Thread Marc Hoppins
Thanks for that info.

I did see in the documentation that a value of 16 was not recommended for >50 
hosts. Our existing hbase is 76 regionservers so I would imagine that 
(eventually) we will see a similar figure.

There will be some scenarios where an initial setup may have (eg) 2 x 8 HDD and 
future expansion adds either more HDD or newer nodes with larger storage.  It 
couldn’t be guaranteed that the storage would double but might increase by 
either less than 2x, or 3-4 x existing amount resulting in a heterogenous 
storage configuration.  In these cases how would it affect efficiency if the 
token figure were the same across all nodes?

From: Elliott Sims 
Sent: Thursday, June 16, 2022 12:24 AM
To: user@cassandra.apache.org
Subject: Re: Configuration for new(expanding) cluster and new admins.

EXTERNAL
If you set a different num_tokens value for new hosts (the value should never 
be changed on an existing host), the amount of data moved to that host will be 
proportional to the num_tokens value.  So, if the new hosts are set to 32 when 
they're added to the cluster, those hosts will get twice as much data as the 
initial 16-token hosts.

I think it's generally advised to keep a Cassandra cluster identical in terms 
of hardware and num_tokens, at least within a DC.  I suspect having a lot of 
different values would slow down Reaper significantly, but I've had decent 
results so far adding a few hosts with beefier hardware and num_tokens=32 to an 
existing 16-token cluster.

On Wed, Jun 15, 2022 at 1:33 AM Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Hi all,

Say we have 2 datacentres with 12 nodes in each. All hardware is the same.

4-core, 2 x HDD (eg, 4TiB)

num_tokens = 16 as a start point

If a plan is to gradually increase the nodes per DC, and new hardware will have 
more of everything, especially storage, I assume I increase the num_tokens 
value.  Should I have started with a lower value?

What would be considered as a good adjustment for:

Any increase in number of HDD for any node?

Any increase in capacity per HDD for any node?

Is there any direct correlation between new token count and the proportional 
increase in either quantity of devices or total capacity, or is any adjustment 
purely arbitrary just to differentiate between varied nodes?

Thanks

M

This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review and 
use of the intended recipient(s). If you have received this email in error, 
please notify the sender and permanently delete this email, its content, and 
any attachment(s). Any disclosure, copying, or taking of any action in reliance 
on an email received in error is strictly prohibited.