>
> We should be using the default value that benefits the most people, rather
> than an arbitrary compromise.

I'd caution we're talking about the default value *we believe* will benefit
the most people according to our respective understandings of C* usage.

 Most clusters don't shrink, they stay the same size or grow. I'd say 90%
> or more fall in this category.

While I agree with the "most don't shrink, they stay the same or grow"
claim intuitively, there's a distinct difference impacting the 4 vs. 16
debate between what ratio we think stays the same size and what ratio we
think grows that I think informs this discussion.

There's a *lot* of Cassandra out in the world, and these changes are going
to impact all of it. I'm not advocating a certain position on 4 vs. 16, but
I do think we need to be very careful about how strongly we hold our
beliefs and present them as facts in discussions like this.

For my unsolicited .02, it sounds an awful lot like we're stuck between a
rock and a hard place in that there is no correct "one size fits all"
answer here (or, said another way: both 4 and 16 are correct, just for
different cases and we don't know / agree on which one we think is the
right one to target), so perhaps a discussion on a smart evolution of token
allocation counts based on quantized tiers of cluster size and dataset
growth (either automated or through operational best practices) could be
valuable along with this.

On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski <a...@thelastpickle.com>
wrote:

> While I (mostly) understand the maths behind using 4 vnodes as a default
> (which really is a question of extreme availability), I don't think they
> provide noticeable performance improvements over using 16, while 16 vnodes
> will protect folks from imbalances. It is very hard to deal with unbalanced
> clusters, and people start to deal with it once some nodes are already
> close to being full. Operationally, it's far from trivial.
> We're going to make some experiments at bootstrapping clusters with 4
> tokens on the latest alpha to see how much balance we can expect, and how
> removing one node could impact it.
>
> If we're talking about repairs, using 4 vnodes will generate overstreaming,
> which can create lots of serious performance issues. Even on clusters with
> 500GB of node density, we never use less than ~15 segments per node with
> Reaper.
> Not everyone uses Reaper, obviously, and there will be no protection
> against overstreaming with such a low default for folks not using subrange
> repairs.
> On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and Reaper
> already allows to get good repair performance because token ranges sharing
> the exact same replicas will be processed in a single repair session. On
> large clusters, I reckon it's good to have way less vnodes to speed up
> repairs.
>
> Cassandra 4.0 is supposed to aim at providing a rock stable release of
> Cassandra, fixing past instabilities, and I think lowering to 4 tokens by
> default defeats that purpose.
> 16 tokens is a reasonable compromise for clusters of all sizes, without
> being too aggressive. Those with enough C* experience can still lower that
> number for their clusters.
>
> Cheers,
>
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever <m...@apache.org> wrote:
>
> >
> > > TLDR, based on availability concerns, skew concerns, operational
> > > concerns, and based on the fact that the new allocation algorithm can
> > > be configured fairly simply now, this is a proposal to go with 4 as the
> > > new default and the allocate_tokens_for_local_replication_factor set to
> > > 3.
> >
> >
> > I'm uncomfortable going with the default of `num_tokens: 4`.
> > I would rather see a default of `num_tokens: 16` based on the following…
> >
> > a) 4 num_tokens does not provide a good out-of-the-box experience.
> > b) 4 num_tokens doesn't provide any significant streaming benefits over
> 16.
> > c)  edge-case availability doesn't trump (a) & (b)
> >
> >
> > For (a)…
> >  The first node in each rack, up to RF racks, in each datacenter can't
> use
> > the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first
> > three nodes will be poorly balanced. If three poorly unbalanced nodes in
> a
> > cluster is an issue (because the cluster is small enough) therefore 4 is
> > the wrong default. From our own experience, we have had to bootstrap
> these
> > nodes multiple times until they generate something ok. In practice 4
> > num_tokens (over 16) has provided more headache with clients than gain.
> >
> > Elaborating, 256 was originally chosen because the token randomness over
> > that many always averaged out. With a default of
> > `allocate_tokens_for_local_replication_factor: 3` this issue is largely
> > solved, but you will still have those initial nodes with randomly
> generated
> > tokens. Ref:
> >
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80
> > And to be precise: tokens are randomly generated until there is a node in
> > each rack up to RF racks. So, if you have RF=3, in theory (or are a
> newbie)
> > you could boot 100 nodes only in the first two racks, and they will all
> be
> > random tokens regardless of the
> > allocate_tokens_for_local_replication_factor setting.
> >
> > For example, using 4 num_tokens, 3 racks and RF=3…
> >  - in a 6 node cluster, there's a total of 24 tokens, half of which are
> > random,
> >  - in a 9 node cluster, there's a total of 36 tokens, a third of which is
> > random,
> >  - etc
> >
> > Following this logic i would not be willing to apply 4 unless you know
> > there will be more than 36 nodes in each data centre, ie less than ~8% of
> > your tokens are randomly generated. Many clusters don't have that size,
> and
> > imho that's why 4 is a bad default.
> >
> > A default of 16 by the same logic only needs 9 nodes in each dc to
> > overcome that randomness degree.
> >
> > The workaround to all this is having to manually define `initial_token:
> …`
> > on those initial nodes. I'm really not inspired imposing that upon new
> > users.
> >
> > For (b)…
> >  there's been a number of improvements already around streaming that
> > solves much of what would be any difference there is between 4 and 16
> > num_tokens. And 4 num_tokens means bigger token ranges so could well be
> > disadvantageous due to over-streaming.
> >
> > For (c)…
> >  we are trying to optimise availability in situations we can never
> > guarantee availability. I understand it's a nice operational advantage to
> > have in a shit-show, but it's not a systems design that you can design
> and
> > rely upon. There's also the question of availability vs the size of the
> > token-range that becomes unavailable.
> >
> >
> >
> > regards,
> > Mick
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Reply via email to