Re: [Discuss] num_tokens default in Cassandra 4.0

Jon Haddad Thu, 30 Jan 2020 06:58:47 -0800

Larger clusters is where high token counts do the most damage. That's why
it's such a problem. You start out with a small cluster using 256, as you
grow into the hundreds it becomes more and more unstable.



On Thu, Jan 30, 2020, 8:19 AM onmstester onmstester
<[email protected]> wrote:

> Shouldn't we consider the cluster size to configure num_tokens?
>
> For example is it OK to use num_tokens=4 for a cluster of more than 100 of
> nodes?
>
>
>
> Another question that is not so much relevant to this :
>
> When we use the token assignment algorithm (the new/non-random one) for a
> specific keyspace, why should we use initial token for all the seeds, isn't
> one seed enough and then just set the keyspace for all other nodes?
>
>
>
> Also i do not understand why should we consider rack topology and number
> of racks for configuration of num_tokens?
>
>
>
> Sent using https://www.zoho.com/mail/
>
>
>
>
> ---- On Thu, 30 Jan 2020 04:33:57 +0330 Jeremy Hanna <
> [email protected]> wrote ----
>
>
> The new default wouldn't be retroactively set for 3.x, but the same
> principles apply.  The new algorithm is in 3.x as well as the
> simplification of the configuration.  So no reason not to use the same
> configuration on 3.x.
>
> > On Jan 30, 2020, at 4:34 AM, Chen-Becker, Derek <mailto:
> [email protected]> wrote:
> >
> > Does the same guidance apply to 3.x clusters? I read through the JIRA
> ticket linked below, along with tickets that it links to, but it's not
> clear that the new allocation algorithm is available in 3.x or if there are
> other reasons that this would be problematic.
> >
> > Thanks,
> >
> > Derek
> >
> > On 1/29/20, 9:54 AM, "Jon Haddad" <mailto:[email protected]> wrote:
> >
> >    Ive put a lot of my previous clients on 4 tokens, all of which have
> >    resulted in a major improvement.
> >
> >    I wouldn't use any more than 4 except under some pretty unusual
> >    circumstances.
> >
> >    Jon
> >
> >    On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead <mailto:
> [email protected]> wrote:
> >
> >> +1 to reducing the number of tokens as low as possible for availability
> >> issues. 4 lgtm
> >>
> >> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi <mailto:[email protected]>
> wrote:
> >>
> >>> Thanks for restarting this discussion Jeremy. I personally think 4 is
> a
> >>> good number as a default. I think whatever we pick, we should have
> enough
> >>> documentation for operators to make sense of the new defaults in 4.0.
> >>>
> >>> Dinesh
> >>>
> >>>> On Jan 28, 2020, at 9:25 PM, Jeremy Hanna <mailto:
> [email protected]>
> >>> wrote:
> >>>>
> >>>> I wanted to start a discussion about the default for num_tokens that
> >>> we'd like for people starting in Cassandra 4.0.  This is for ticket
> >>> CASSANDRA-13701 <https://issues.apache.org/jira/browse/CASSANDRA-13701>
>
> >>> (which has been duplicated a number of times, most recently by me).
> >>>>
> >>>> TLDR, based on availability concerns, skew concerns, operational
> >>> concerns, and based on the fact that the new allocation algorithm can
> be
> >>> configured fairly simply now, this is a proposal to go with 4 as the
> new
> >>> default and the allocate_tokens_for_local_replication_factor set to 3.
> >>> That gives a good experience out of the box for people and is the most
> >>> conservative.  It does assume that racks and DCs have been configured
> >>> correctly.  We would, of course, go into some detail in the NEWS.txt.
> >>>>
> >>>> Joey Lynch and Josh Snyder did an extensive analysis of availability
> >>> concerns with high num_tokens/virtual nodes in their paper <
> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> >>> .
> >>> This worsens as clusters grow larger.  I won't quote the paper here
> but
> >> in
> >>> order to have a conservative default and with the accompanying new
> >>> allocation algorithm, I think it makes sense as a default.
> >>>>
> >>>> The difficulties have always been that virtual nodes have been
> >>> beneficial for operations but that 256 is too high for the purposes of
> >>> repair and as Joey and Josh cover, for availability.  Going lower with
> >> the
> >>> original allocation algorithm has produced skew in allocation in its
> >> naive
> >>> distribution.  Enter CASSANDRA-7032 <
> >>> https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new
> token
> >>> allocation algorithm.  CASSANDRA-15260 <
> >>> https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> >>> algorithm operationally simpler.
> >>>>
> >>>> One other item of note - since Joey and Josh's analysis, there have
> >> been
> >>> improvements in streaming and other considerations that can reduce the
> >>> probability of more than one node representing some token range being
> >>> unavailable, but it would still be good to be conservative.
> >>>>
> >>>> Please chime in with any concerns with having num_tokens=4 and
> >>> allocate_tokens_for_local_replication_factor=3 and the accompanying
> >>> rationale so we can improve the experience for all users.
> >>>>
> >>>> Other resources:
> >>>>
> >>>
> >>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >>>>
> >>>
> >>
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> >>>>
> >>>
> >>
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: mailto:[email protected]
> >>> For additional commands, e-mail: mailto:[email protected]
> >>>
> >>>
> >>
> >> --
> >>
> >> Ben Bromhead
> >>
> >> Instaclustr | www.instaclustr.com | @instaclustr
> >> <http://twitter.com/instaclustr> | (650) 284 9692
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: mailto:[email protected]
> > For additional commands, e-mail: mailto:[email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mailto:[email protected]
> For additional commands, e-mail: mailto:[email protected]

Re: [Discuss] num_tokens default in Cassandra 4.0

Reply via email to