Re: [Discuss] num_tokens default in Cassandra 4.0

Jon Haddad Wed, 29 Jan 2020 08:50:10 -0800

Ive put a lot of my previous clients on 4 tokens, all of which have
resulted in a major improvement.


I wouldn't use any more than 4 except under some pretty unusual
circumstances.

Jon

On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead <[email protected]> wrote:

> +1 to reducing the number of tokens as low as possible for availability
> issues. 4 lgtm
>
> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi <[email protected]> wrote:
>
> > Thanks for restarting this discussion Jeremy. I personally think 4 is a
> > good number as a default. I think whatever we pick, we should have enough
> > documentation for operators to make sense of the new defaults in 4.0.
> >
> > Dinesh
> >
> > > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna <[email protected]>
> > wrote:
> > >
> > > I wanted to start a discussion about the default for num_tokens that
> > we'd like for people starting in Cassandra 4.0.  This is for ticket
> > CASSANDRA-13701 <https://issues.apache.org/jira/browse/CASSANDRA-13701>
> > (which has been duplicated a number of times, most recently by me).
> > >
> > > TLDR, based on availability concerns, skew concerns, operational
> > concerns, and based on the fact that the new allocation algorithm can be
> > configured fairly simply now, this is a proposal to go with 4 as the new
> > default and the allocate_tokens_for_local_replication_factor set to 3.
> > That gives a good experience out of the box for people and is the most
> > conservative.  It does assume that racks and DCs have been configured
> > correctly.  We would, of course, go into some detail in the NEWS.txt.
> > >
> > > Joey Lynch and Josh Snyder did an extensive analysis of availability
> > concerns with high num_tokens/virtual nodes in their paper <
> >
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> >.
> > This worsens as clusters grow larger.  I won't quote the paper here but
> in
> > order to have a conservative default and with the accompanying new
> > allocation algorithm, I think it makes sense as a default.
> > >
> > > The difficulties have always been that virtual nodes have been
> > beneficial for operations but that 256 is too high for the purposes of
> > repair and as Joey and Josh cover, for availability.  Going lower with
> the
> > original allocation algorithm has produced skew in allocation in its
> naive
> > distribution.  Enter CASSANDRA-7032 <
> > https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
> > allocation algorithm.  CASSANDRA-15260 <
> > https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> > algorithm operationally simpler.
> > >
> > > One other item of note - since Joey and Josh's analysis, there have
> been
> > improvements in streaming and other considerations that can reduce the
> > probability of more than one node representing some token range being
> > unavailable, but it would still be good to be conservative.
> > >
> > > Please chime in with any concerns with having num_tokens=4 and
> > allocate_tokens_for_local_replication_factor=3 and the accompanying
> > rationale so we can improve the experience for all users.
> > >
> > > Other resources:
> > >
> >
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> > >
> >
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> > >
> >
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
> <http://twitter.com/instaclustr> | (650) 284 9692
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Reply via email to