Larger clusters is where high token counts do the most damage. That's why it's such a problem. You start out with a small cluster using 256, as you grow into the hundreds it becomes more and more unstable.
On Thu, Jan 30, 2020, 8:19 AM onmstester onmstester <onmstes...@zoho.com.invalid> wrote: > Shouldn't we consider the cluster size to configure num_tokens? > > For example is it OK to use num_tokens=4 for a cluster of more than 100 of > nodes? > > > > Another question that is not so much relevant to this : > > When we use the token assignment algorithm (the new/non-random one) for a > specific keyspace, why should we use initial token for all the seeds, isn't > one seed enough and then just set the keyspace for all other nodes? > > > > Also i do not understand why should we consider rack topology and number > of racks for configuration of num_tokens? > > > > Sent using https://www.zoho.com/mail/ > > > > > ---- On Thu, 30 Jan 2020 04:33:57 +0330 Jeremy Hanna < > jeremy.hanna1...@gmail.com> wrote ---- > > > The new default wouldn't be retroactively set for 3.x, but the same > principles apply. The new algorithm is in 3.x as well as the > simplification of the configuration. So no reason not to use the same > configuration on 3.x. > > > On Jan 30, 2020, at 4:34 AM, Chen-Becker, Derek <mailto: > dchen...@amazon.com.INVALID> wrote: > > > > Does the same guidance apply to 3.x clusters? I read through the JIRA > ticket linked below, along with tickets that it links to, but it's not > clear that the new allocation algorithm is available in 3.x or if there are > other reasons that this would be problematic. > > > > Thanks, > > > > Derek > > > > On 1/29/20, 9:54 AM, "Jon Haddad" <mailto:j...@jonhaddad.com> wrote: > > > > Ive put a lot of my previous clients on 4 tokens, all of which have > > resulted in a major improvement. > > > > I wouldn't use any more than 4 except under some pretty unusual > > circumstances. > > > > Jon > > > > On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead <mailto: > b...@instaclustr.com> wrote: > > > >> +1 to reducing the number of tokens as low as possible for availability > >> issues. 4 lgtm > >> > >> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi <mailto:djo...@apache.org> > wrote: > >> > >>> Thanks for restarting this discussion Jeremy. I personally think 4 is > a > >>> good number as a default. I think whatever we pick, we should have > enough > >>> documentation for operators to make sense of the new defaults in 4.0. > >>> > >>> Dinesh > >>> > >>>> On Jan 28, 2020, at 9:25 PM, Jeremy Hanna <mailto: > jeremy.hanna1...@gmail.com> > >>> wrote: > >>>> > >>>> I wanted to start a discussion about the default for num_tokens that > >>> we'd like for people starting in Cassandra 4.0. This is for ticket > >>> CASSANDRA-13701 <https://issues.apache.org/jira/browse/CASSANDRA-13701> > > >>> (which has been duplicated a number of times, most recently by me). > >>>> > >>>> TLDR, based on availability concerns, skew concerns, operational > >>> concerns, and based on the fact that the new allocation algorithm can > be > >>> configured fairly simply now, this is a proposal to go with 4 as the > new > >>> default and the allocate_tokens_for_local_replication_factor set to 3. > >>> That gives a good experience out of the box for people and is the most > >>> conservative. It does assume that racks and DCs have been configured > >>> correctly. We would, of course, go into some detail in the NEWS.txt. > >>>> > >>>> Joey Lynch and Josh Snyder did an extensive analysis of availability > >>> concerns with high num_tokens/virtual nodes in their paper < > >>> > >> > http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E > >>> . > >>> This worsens as clusters grow larger. I won't quote the paper here > but > >> in > >>> order to have a conservative default and with the accompanying new > >>> allocation algorithm, I think it makes sense as a default. > >>>> > >>>> The difficulties have always been that virtual nodes have been > >>> beneficial for operations but that 256 is too high for the purposes of > >>> repair and as Joey and Josh cover, for availability. Going lower with > >> the > >>> original allocation algorithm has produced skew in allocation in its > >> naive > >>> distribution. Enter CASSANDRA-7032 < > >>> https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new > token > >>> allocation algorithm. CASSANDRA-15260 < > >>> https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new > >>> algorithm operationally simpler. > >>>> > >>>> One other item of note - since Joey and Josh's analysis, there have > >> been > >>> improvements in streaming and other considerations that can reduce the > >>> probability of more than one node representing some token range being > >>> unavailable, but it would still be good to be conservative. > >>>> > >>>> Please chime in with any concerns with having num_tokens=4 and > >>> allocate_tokens_for_local_replication_factor=3 and the accompanying > >>> rationale so we can improve the experience for all users. > >>>> > >>>> Other resources: > >>>> > >>> > >> > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html > >>>> > >>> > >> > https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html > >>>> > >>> > >> > https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30 > >>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: mailto:dev-h...@cassandra.apache.org > >>> > >>> > >> > >> -- > >> > >> Ben Bromhead > >> > >> Instaclustr | www.instaclustr.com | @instaclustr > >> <http://twitter.com/instaclustr> | (650) 284 9692 > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: mailto:dev-h...@cassandra.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: mailto:dev-h...@cassandra.apache.org