Ive put a lot of my previous clients on 4 tokens, all of which have resulted in a major improvement.
I wouldn't use any more than 4 except under some pretty unusual circumstances. Jon On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead <b...@instaclustr.com> wrote: > +1 to reducing the number of tokens as low as possible for availability > issues. 4 lgtm > > On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi <djo...@apache.org> wrote: > > > Thanks for restarting this discussion Jeremy. I personally think 4 is a > > good number as a default. I think whatever we pick, we should have enough > > documentation for operators to make sense of the new defaults in 4.0. > > > > Dinesh > > > > > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com> > > wrote: > > > > > > I wanted to start a discussion about the default for num_tokens that > > we'd like for people starting in Cassandra 4.0. This is for ticket > > CASSANDRA-13701 <https://issues.apache.org/jira/browse/CASSANDRA-13701> > > (which has been duplicated a number of times, most recently by me). > > > > > > TLDR, based on availability concerns, skew concerns, operational > > concerns, and based on the fact that the new allocation algorithm can be > > configured fairly simply now, this is a proposal to go with 4 as the new > > default and the allocate_tokens_for_local_replication_factor set to 3. > > That gives a good experience out of the box for people and is the most > > conservative. It does assume that racks and DCs have been configured > > correctly. We would, of course, go into some detail in the NEWS.txt. > > > > > > Joey Lynch and Josh Snyder did an extensive analysis of availability > > concerns with high num_tokens/virtual nodes in their paper < > > > http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E > >. > > This worsens as clusters grow larger. I won't quote the paper here but > in > > order to have a conservative default and with the accompanying new > > allocation algorithm, I think it makes sense as a default. > > > > > > The difficulties have always been that virtual nodes have been > > beneficial for operations but that 256 is too high for the purposes of > > repair and as Joey and Josh cover, for availability. Going lower with > the > > original allocation algorithm has produced skew in allocation in its > naive > > distribution. Enter CASSANDRA-7032 < > > https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token > > allocation algorithm. CASSANDRA-15260 < > > https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new > > algorithm operationally simpler. > > > > > > One other item of note - since Joey and Josh's analysis, there have > been > > improvements in streaming and other considerations that can reduce the > > probability of more than one node representing some token range being > > unavailable, but it would still be good to be conservative. > > > > > > Please chime in with any concerns with having num_tokens=4 and > > allocate_tokens_for_local_replication_factor=3 and the accompanying > > rationale so we can improve the experience for all users. > > > > > > Other resources: > > > > > > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html > > > > > > https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html > > > > > > https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30 > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > -- > > Ben Bromhead > > Instaclustr | www.instaclustr.com | @instaclustr > <http://twitter.com/instaclustr> | (650) 284 9692 >