[ 
https://issues.apache.org/jira/browse/CASSANDRA-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-15521:
-------------------------------------
    Resolution: Duplicate
        Status: Resolved  (was: Triage Needed)

> Update default for num_tokens from 256 to something more reasonable
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-15521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15521
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Virtual Nodes
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Normal
>
> The default for num_tokens or the number of token ranges assigned to a node 
> using virtual nodes is way too high.  256 token ranges makes repair painful.  
> Since it's a default, someone new to Cassandra won't know better and if left 
> unchanged, they will have to live with it or perform a migration to a new 
> datacenter with a lower number.
> At the same time, going too low with the default allocation algorithm can 
> hotspot nodes to have more tokens assigned than others.  There is a new token 
> allocation algorithm introduced but it's not default.
> The proposal of this ticket is to set the default to something more 
> reasonable to align with best practices without using the new token algorithm 
> or giving it specific token values as some do.  32 is a good compromise and 
> is what the project uses in a lot of the tests that are done.
> So generally it would be good to move to a more sane value and to align with 
> testing so users are more confident that the defaults have a lot of testing 
> behind them.
> As discussed on the dev mailing list, we want to make sure this change to the 
> default doesn't come as an unpleasant surprise to cluster operators.  For 
> num_tokens specifically, if you were to upgrade to a version with the new 
> default and the user didn't change it to the existing value, the node would 
> not start, saying you can't change the num_tokens on an existing node.  So we 
> will want to put a release note to indicate that when upgrading, make a note 
> of the num_tokens change when looking at the new configuration.
> Along with not being able to start nodes, which is fail-fast, there is the 
> matter of adding new nodes to the cluster.  You can certainly add a new node 
> to a cluster or datacenter with a different number of token ranges assigned.  
> It will give that node a different amount of data to be responsible for.  For 
> example, if the nodes in a datacenter all have num_tokens=256 (current 
> default) and you add a node to that datacenter with num_tokens=32 (new 
> default), it will only claim 1/8th of the token ranges and data as the other 
> nodes in that datacenter.  Fortunately, this is a property that is explicitly 
> defined rather than implicit like some of the table settings.  Also most if 
> not all operators will upgrade the existing nodes to that new version before 
> trying to add a node with that new version.  So if there is a different 
> number for num_tokens on the existing nodes, they'll be aware of it 
> immediately.
> In any case, this is a long proposal for what will be a small change in the 
> cassandra.yaml and something in the release notes, that is, changing the 
> default num_tokens value from 256 to 32.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to