Re: Configuration for new(expanding) cluster and new admins.
One of the advantages of faster streaming in 4.0+ is that it’s now very much viable to do this entirely with bootstraps and decoms in the same DC, when you have use cases where you can’t just change DC names Vnodes will cause more compaction than single token, but you can just add in all the extra hosts (running cleanup after they’re in the cluster), allow them to be underutilized, and then decommission the old hosts In this order they’ll never have more load than they started with, it’s strictly correct from a data visibility standpoint, and the bootstraps at the beginning drop the load on everything very quick so the rest of the operations are done at low cpu load relative to your starting point The decoms will cause some compaction, so don’t rush those. > On Jun 20, 2022, at 9:45 AM, Elliott Sims wrote: > > > If the token value is the same across heterogenous nodes, it means that each > node gets a (roughly) equivalent amount of data and work to do. So the > bigger servers would be under-utilized. > > My answer so far to varied hardware getting out of hand is a periodic > hardware refresh and "datacenter" migration. Stand up a logical "datacenter" > with all-new uniform denser hardware and a uniform vnode count (probably 16), > migrate to it, tear down the old hardware. > >> On Thu, Jun 16, 2022 at 12:31 AM Marc Hoppins wrote: >> Thanks for that info. >> >> >> >> I did see in the documentation that a value of 16 was not recommended for >> >50 hosts. Our existing hbase is 76 regionservers so I would imagine that >> (eventually) we will see a similar figure. >> >> >> >> There will be some scenarios where an initial setup may have (eg) 2 x 8 HDD >> and future expansion adds either more HDD or newer nodes with larger >> storage. It couldn’t be guaranteed that the storage would double but might >> increase by either less than 2x, or 3-4 x existing amount resulting in a >> heterogenous storage configuration. In these cases how would it affect >> efficiency if the token figure were the same across all nodes? >> >> >> >> From: Elliott Sims >> Sent: Thursday, June 16, 2022 12:24 AM >> To: user@cassandra.apache.org >> Subject: Re: Configuration for new(expanding) cluster and new admins. >> >> >> >> EXTERNAL >> >> If you set a different num_tokens value for new hosts (the value should >> never be changed on an existing host), the amount of data moved to that host >> will be proportional to the num_tokens value. So, if the new hosts are set >> to 32 when they're added to the cluster, those hosts will get twice as much >> data as the initial 16-token hosts. >> >> I think it's generally advised to keep a Cassandra cluster identical in >> terms of hardware and num_tokens, at least within a DC. I suspect having a >> lot of different values would slow down Reaper significantly, but I've had >> decent results so far adding a few hosts with beefier hardware and >> num_tokens=32 to an existing 16-token cluster. >> >> >> >> On Wed, Jun 15, 2022 at 1:33 AM Marc Hoppins wrote: >> >> Hi all, >> >> Say we have 2 datacentres with 12 nodes in each. All hardware is the same. >> >> 4-core, 2 x HDD (eg, 4TiB) >> >> num_tokens = 16 as a start point >> >> If a plan is to gradually increase the nodes per DC, and new hardware will >> have more of everything, especially storage, I assume I increase the >> num_tokens value. Should I have started with a lower value? >> >> What would be considered as a good adjustment for: >> >> Any increase in number of HDD for any node? >> >> Any increase in capacity per HDD for any node? >> >> Is there any direct correlation between new token count and the proportional >> increase in either quantity of devices or total capacity, or is any >> adjustment purely arbitrary just to differentiate between varied nodes? >> >> Thanks >> >> M >> >> >> This email, including its contents and any attachment(s), may contain >> confidential and/or proprietary information and is solely for the review and >> use of the intended recipient(s). If you have received this email in error, >> please notify the sender and permanently delete this email, its content, and >> any attachment(s). Any disclosure, copying, or taking of any action in >> reliance on an email received in error is strictly prohibited. >> > > This email, including its contents and any attachment(s), may contain > confidential and/or proprietary information and is solely for the review and > use of the intended recipient(s). If you have received this email in error, > please notify the sender and permanently delete this email, its content, and > any attachment(s). Any disclosure, copying, or taking of any action in > reliance on an email received in error is strictly prohibited.
Re: Configuration for new(expanding) cluster and new admins.
If the token value is the same across heterogenous nodes, it means that each node gets a (roughly) equivalent amount of data and work to do. So the bigger servers would be under-utilized. My answer so far to varied hardware getting out of hand is a periodic hardware refresh and "datacenter" migration. Stand up a logical "datacenter" with all-new uniform denser hardware and a uniform vnode count (probably 16), migrate to it, tear down the old hardware. On Thu, Jun 16, 2022 at 12:31 AM Marc Hoppins wrote: > Thanks for that info. > > > > I did see in the documentation that a value of 16 was not recommended for > >50 hosts. Our existing hbase is 76 regionservers so I would imagine that > (eventually) we will see a similar figure. > > > > There will be some scenarios where an initial setup may have (eg) 2 x 8 > HDD and future expansion adds either more HDD or newer nodes with larger > storage. It couldn’t be guaranteed that the storage would double but might > increase by either less than 2x, or 3-4 x existing amount resulting in a > heterogenous storage configuration. In these cases how would it affect > efficiency if the token figure were the same across all nodes? > > > > *From:* Elliott Sims > *Sent:* Thursday, June 16, 2022 12:24 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Configuration for new(expanding) cluster and new admins. > > > > EXTERNAL > > If you set a different num_tokens value for new hosts (the value should > never be changed on an existing host), the amount of data moved to that > host will be proportional to the num_tokens value. So, if the new hosts > are set to 32 when they're added to the cluster, those hosts will get twice > as much data as the initial 16-token hosts. > > I think it's generally advised to keep a Cassandra cluster identical in > terms of hardware and num_tokens, at least within a DC. I suspect having a > lot of different values would slow down Reaper significantly, but I've had > decent results so far adding a few hosts with beefier hardware and > num_tokens=32 to an existing 16-token cluster. > > > > On Wed, Jun 15, 2022 at 1:33 AM Marc Hoppins > wrote: > > Hi all, > > Say we have 2 datacentres with 12 nodes in each. All hardware is the same. > > 4-core, 2 x HDD (eg, 4TiB) > > num_tokens = 16 as a start point > > If a plan is to gradually increase the nodes per DC, and new hardware will > have more of everything, especially storage, I assume I increase the > num_tokens value. Should I have started with a lower value? > > What would be considered as a good adjustment for: > > Any increase in number of HDD for any node? > > Any increase in capacity per HDD for any node? > > Is there any direct correlation between new token count and the > proportional increase in either quantity of devices or total capacity, or > is any adjustment purely arbitrary just to differentiate between varied > nodes? > > Thanks > > M > > > This email, including its contents and any attachment(s), may contain > confidential and/or proprietary information and is solely for the review > and use of the intended recipient(s). If you have received this email in > error, please notify the sender and permanently delete this email, its > content, and any attachment(s). Any disclosure, copying, or taking of any > action in reliance on an email received in error is strictly prohibited. > -- This email, including its contents and any attachment(s), may contain confidential and/or proprietary information and is solely for the review and use of the intended recipient(s). If you have received this email in error, please notify the sender and permanently delete this email, its content, and any attachment(s). Any disclosure, copying, or taking of any action in reliance on an email received in error is strictly prohibited.
RE: Configuration for new(expanding) cluster and new admins.
I have run clusters with different disk size nodes by using different number of num_tokens. I used the basic math of just increasing the num_tokens by the same percentage as change in disk size. (So, if my "normal" node was 8 tokens, one with double the disk space would be 16.) One thing to watch/consider - the (number of tokens) * (the number of nodes) makes repairs work harder Sean R. Durity INTERNAL USE -Original Message- From: Marc Hoppins Sent: Wednesday, June 15, 2022 3:34 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Configuration for new(expanding) cluster and new admins. Hi all, Say we have 2 datacentres with 12 nodes in each. All hardware is the same. 4-core, 2 x HDD (eg, 4TiB) num_tokens = 16 as a start point If a plan is to gradually increase the nodes per DC, and new hardware will have more of everything, especially storage, I assume I increase the num_tokens value. Should I have started with a lower value? What would be considered as a good adjustment for: Any increase in number of HDD for any node? Any increase in capacity per HDD for any node? Is there any direct correlation between new token count and the proportional increase in either quantity of devices or total capacity, or is any adjustment purely arbitrary just to differentiate between varied nodes? Thanks M
RE: Configuration for new(expanding) cluster and new admins.
Thanks for that info. I did see in the documentation that a value of 16 was not recommended for >50 hosts. Our existing hbase is 76 regionservers so I would imagine that (eventually) we will see a similar figure. There will be some scenarios where an initial setup may have (eg) 2 x 8 HDD and future expansion adds either more HDD or newer nodes with larger storage. It couldn’t be guaranteed that the storage would double but might increase by either less than 2x, or 3-4 x existing amount resulting in a heterogenous storage configuration. In these cases how would it affect efficiency if the token figure were the same across all nodes? From: Elliott Sims Sent: Thursday, June 16, 2022 12:24 AM To: user@cassandra.apache.org Subject: Re: Configuration for new(expanding) cluster and new admins. EXTERNAL If you set a different num_tokens value for new hosts (the value should never be changed on an existing host), the amount of data moved to that host will be proportional to the num_tokens value. So, if the new hosts are set to 32 when they're added to the cluster, those hosts will get twice as much data as the initial 16-token hosts. I think it's generally advised to keep a Cassandra cluster identical in terms of hardware and num_tokens, at least within a DC. I suspect having a lot of different values would slow down Reaper significantly, but I've had decent results so far adding a few hosts with beefier hardware and num_tokens=32 to an existing 16-token cluster. On Wed, Jun 15, 2022 at 1:33 AM Marc Hoppins mailto:marc.hopp...@eset.com>> wrote: Hi all, Say we have 2 datacentres with 12 nodes in each. All hardware is the same. 4-core, 2 x HDD (eg, 4TiB) num_tokens = 16 as a start point If a plan is to gradually increase the nodes per DC, and new hardware will have more of everything, especially storage, I assume I increase the num_tokens value. Should I have started with a lower value? What would be considered as a good adjustment for: Any increase in number of HDD for any node? Any increase in capacity per HDD for any node? Is there any direct correlation between new token count and the proportional increase in either quantity of devices or total capacity, or is any adjustment purely arbitrary just to differentiate between varied nodes? Thanks M This email, including its contents and any attachment(s), may contain confidential and/or proprietary information and is solely for the review and use of the intended recipient(s). If you have received this email in error, please notify the sender and permanently delete this email, its content, and any attachment(s). Any disclosure, copying, or taking of any action in reliance on an email received in error is strictly prohibited.
Re: Configuration for new(expanding) cluster and new admins.
If you set a different num_tokens value for new hosts (the value should never be changed on an existing host), the amount of data moved to that host will be proportional to the num_tokens value. So, if the new hosts are set to 32 when they're added to the cluster, those hosts will get twice as much data as the initial 16-token hosts. I think it's generally advised to keep a Cassandra cluster identical in terms of hardware and num_tokens, at least within a DC. I suspect having a lot of different values would slow down Reaper significantly, but I've had decent results so far adding a few hosts with beefier hardware and num_tokens=32 to an existing 16-token cluster. On Wed, Jun 15, 2022 at 1:33 AM Marc Hoppins wrote: > Hi all, > > Say we have 2 datacentres with 12 nodes in each. All hardware is the same. > > 4-core, 2 x HDD (eg, 4TiB) > > num_tokens = 16 as a start point > > If a plan is to gradually increase the nodes per DC, and new hardware will > have more of everything, especially storage, I assume I increase the > num_tokens value. Should I have started with a lower value? > > What would be considered as a good adjustment for: > > Any increase in number of HDD for any node? > > Any increase in capacity per HDD for any node? > > Is there any direct correlation between new token count and the > proportional increase in either quantity of devices or total capacity, or > is any adjustment purely arbitrary just to differentiate between varied > nodes? > > Thanks > > M > -- This email, including its contents and any attachment(s), may contain confidential and/or proprietary information and is solely for the review and use of the intended recipient(s). If you have received this email in error, please notify the sender and permanently delete this email, its content, and any attachment(s). Any disclosure, copying, or taking of any action in reliance on an email received in error is strictly prohibited.
Re: Configuration for new(expanding) cluster and new admins.
You shouldn't need to change num_tokens at all. num_tokens helps you pretend your cluster is a bigger than it is and randomly selects tokens for you so that your data is approximately evenly distributed. As you add more hosts, it should balance out automatically. The alternative to num_tokens is to use a single token and explicitly calculate it each time to ensure the cluster is properly balanced, and then using `nodetool move` each time you add hosts to the cluster to re-distribute load. num_tokens makes it less likely that you end up imbalanced, so you shouldn't need to move any tokens manually. On Wed, Jun 15, 2022 at 12:34 AM Marc Hoppins wrote: > Hi all, > > Say we have 2 datacentres with 12 nodes in each. All hardware is the same. > > 4-core, 2 x HDD (eg, 4TiB) > > num_tokens = 16 as a start point > > If a plan is to gradually increase the nodes per DC, and new hardware will > have more of everything, especially storage, I assume I increase the > num_tokens value. Should I have started with a lower value? > > What would be considered as a good adjustment for: > > Any increase in number of HDD for any node? > > Any increase in capacity per HDD for any node? > > Is there any direct correlation between new token count and the > proportional increase in either quantity of devices or total capacity, or > is any adjustment purely arbitrary just to differentiate between varied > nodes? > > Thanks > > M >