I think it's a good idea to take a step back and get a high level view of
the problem we're trying to solve.

First, high token counts result in decreased availability as each node has
data overlap with with more nodes in the cluster.  Specifically, a node can
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is
going to almost always share data with every other node in the cluster that
isn't in the same rack, unless you're doing something wild like using more
than a thousand nodes in a cluster.  We advertise

With 16 tokens, that is vastly improved, but you still have up to 64 nodes
each node needs to query against, so you're again, hitting every node
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
wouldn't use 16 here, and I doubt any of you would either.  I've advocated
for 4 tokens because you'd have overlap with only 16 nodes, which works
well for small clusters as well as large.  Assuming I was creating a new
cluster for myself (in a hypothetical brand new application I'm building) I
would put this in production.  I have worked with several teams where I
helped them put 4 token clusters in prod and it has worked very well.  We
didn't see any wild imbalance issues.

As Mick's pointed out, our current method of using random token assignment
for the default number of problematic for 4 tokens.  I fully agree with
this, and I think if we were to try to use 4 tokens, we'd want to address
this in tandem.  We can discuss how to better allocate tokens by default
(something more predictable than random), but I'd like to avoid the
specifics of that for the sake of this email.

To Alex's point, repairs are problematic with lower token counts due to
over streaming.  I think this is a pretty serious issue and I we'd have to
address it before going all the way down to 4.  This, in my opinion, is a
more complex problem to solve and I think trying to fix it here could make
shipping 4.0 take even longer, something none of us want.

For the sake of shipping 4.0 without adding extra overhead and time, I'm ok
with moving to 16 tokens, and in the process adding extensive documentation
outlining what we recommend for production use.  I think we should also try
to figure out something better than random as the default to fix the data
imbalance issues.  I've got a few ideas here I've been noodling on.

As long as folks are fine with potentially changing the default again in C*
5.0 (after another discussion / debate), 16 is enough of an improvement
that I'm OK with the change, and willing to author the docs to help people
set up their first cluster.  For folks that go into production with the
defaults, we're at least not setting them up for total failure once their
clusters get large like we are now.

In future versions, we'll probably want to address the issue of data
imbalance by building something in that shifts individual tokens around.  I
don't think we should try to do this in 4.0 either.

Jon



On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1...@gmail.com>
wrote:

> I think Mick and Anthony make some valid operational and skew points for
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> between small and large clusters but I think most would agree that most
> clusters are on the small to medium side. (A small nuance is afaict the
> probabilities have to do with quorum on a full token range, ie it has to do
> with the size of a datacenter not the full cluster
>
> As I read this discussion I’m personally more inclined to go with 16 for
> now. It’s true that if we could fix the skew and topology gotchas for those
> starting things up, 4 would be ideal from an availability perspective.
> However we’re still in the brainstorming stage for how to address those
> challenges. I think we should create tickets for those issues and go with
> 16 for 4.0.
>
> This is about an out of the box experience. It balances availability,
> operations (such as skew and general bootstrap friendliness and
> streaming/repair), and cluster sizing. Balancing all of those, I think for
> now I’m more comfortable with 16 as the default with docs on considerations
> and tickets to unblock 4 as the default for all users.
>
> >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <joe.e.ly...@gmail.com>
> wrote:
> >> I think that we might be bikeshedding this number a bit because it is
> easy
> >> to debate and there is not yet one right answer.
> >
> >
> > https://www.youtube.com/watch?v=v465T5u9UKo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to