[ https://issues.apache.org/jira/browse/CASSANDRA-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-15521: ------------------------------------- Resolution: Duplicate Status: Resolved (was: Triage Needed) > Update default for num_tokens from 256 to something more reasonable > ------------------------------------------------------------------- > > Key: CASSANDRA-15521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15521 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Virtual Nodes > Reporter: Jeremy Hanna > Assignee: Jeremy Hanna > Priority: Normal > > The default for num_tokens or the number of token ranges assigned to a node > using virtual nodes is way too high. 256 token ranges makes repair painful. > Since it's a default, someone new to Cassandra won't know better and if left > unchanged, they will have to live with it or perform a migration to a new > datacenter with a lower number. > At the same time, going too low with the default allocation algorithm can > hotspot nodes to have more tokens assigned than others. There is a new token > allocation algorithm introduced but it's not default. > The proposal of this ticket is to set the default to something more > reasonable to align with best practices without using the new token algorithm > or giving it specific token values as some do. 32 is a good compromise and > is what the project uses in a lot of the tests that are done. > So generally it would be good to move to a more sane value and to align with > testing so users are more confident that the defaults have a lot of testing > behind them. > As discussed on the dev mailing list, we want to make sure this change to the > default doesn't come as an unpleasant surprise to cluster operators. For > num_tokens specifically, if you were to upgrade to a version with the new > default and the user didn't change it to the existing value, the node would > not start, saying you can't change the num_tokens on an existing node. So we > will want to put a release note to indicate that when upgrading, make a note > of the num_tokens change when looking at the new configuration. > Along with not being able to start nodes, which is fail-fast, there is the > matter of adding new nodes to the cluster. You can certainly add a new node > to a cluster or datacenter with a different number of token ranges assigned. > It will give that node a different amount of data to be responsible for. For > example, if the nodes in a datacenter all have num_tokens=256 (current > default) and you add a node to that datacenter with num_tokens=32 (new > default), it will only claim 1/8th of the token ranges and data as the other > nodes in that datacenter. Fortunately, this is a property that is explicitly > defined rather than implicit like some of the table settings. Also most if > not all operators will upgrade the existing nodes to that new version before > trying to add a node with that new version. So if there is a different > number for num_tokens on the existing nodes, they'll be aware of it > immediately. > In any case, this is a long proposal for what will be a small change in the > cassandra.yaml and something in the release notes, that is, changing the > default num_tokens value from 256 to 32. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org