I think what he means is that if users have existing clusters with
num_tokens=256 (current default) and the default changes to 32, the node
won't ignore the value, it will fail to start with an error that you cannot
change from one num_tokens value to another:
ERROR [main] 2020-01-22 17:10:53,159 CassandraDaemon.java:759 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 256 to 32
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1035)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:717)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:651)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:742)
[apache-cassandra-3.11.5.jar:3.11.5]

For that, he's correct.  I was thinking of when you change initial_token in
the yaml.  If you change the initial_token value(s) and there's already
data on disk, it will happily start and just tell you that is using the
saved tokens it already has, thank you very much.  So it doesn't fail to
start, it just ignores the change.  I had thought changing the num_tokens
would do something but I was wrong.

So there are two scenarios for changing the num_tokens default:
1) people need to be made aware of the change because if they try to
upgrade their existing nodes to 4.0+ and they don't change from the
defaults, then it will fail to start based on this change.
2) as Jeff mentions, if they add a new node into their cluster with the new
default and they were previously using 256, then the new node will claim
just 1/8th of the data as the other nodes.

For 1, I would hope especially for all of the new changes in 4.0 that
people would read the release notes and do their due diligence to say "I'm
going to diff my config differences from the version it corresponds to and
then modify the new config as seems appropriate."  If not and it's
different, they'll find out quickly because it will fail fast.  I'm not as
worried about this one.

For 2, it's a good point and again, hopefully people do the same due
diligence when adding new nodes to their clusters on the new version so
that they aren't surprised by the data density.  In a practical sense,
presumably the ops person has already upgraded their cluster to 4.0 before
adding a new node running 4.0 to their cluster.  So if they weren't aware
of the change in the default in the yaml file, they would get that error
mentioned previously and then know for new nodes added to their cluster.
Similarly, num_tokens is explicit by default and set to 256.  So it
shouldn't be a matter of having it commented out in the yaml and it being
whatever the code determines as the default for the common case.  In that
sense, I'm happy that it's not something that changes underneath you
because you don't set it.

So while I agree that the consequence can be severe when adding a new node,
the cluster operator will already be aware of the change when they upgrade
their existing nodes even if they didn't read the release notes or do their
config due diligence or just simply missed it, which happens as well - it's
a big upgrade.  So if that's all there is, I don't think the change will be
disruptive outside a surprise if they hadn't noticed the change where it
fails fast.

On Wed, Jan 22, 2020 at 5:02 PM Jeff Jirsa <jji...@gmail.com> wrote:

> On Tue, Jan 21, 2020 at 7:41 PM Jonathan Koppenhofer <j...@koppedomain.com>
> wrote:
>
> > If someone isn't explicitly setting vnodes, and the default changes, it
> > will vary from the number of assigned tokens for existing clusters,
> right?
> > Won't this cause the node to fail to start?
> >
>
> Nope. You can have 32 tokens on some instances and 256 in other instances
> in the same dc/cluster. No error. The hosts with 256 tokens will just have
> 8x as much data as the hosts with 32 tokens. And that's why changing
> defaults is hard.
>
>
>
> >
> > I am in favor of changing these defaults, but should provide very clear
> > guidance on vnodes (unless I am wrong).
> >
> > I'm sure there are others that would be safe to change. I'll review our
> > defaults we typically set and report back tomorrow.
> >
> > On Tue, Jan 21, 2020, 7:22 PM Jeremy Hanna <jeremy.hanna1...@gmail.com>
> > wrote:
> >
> > > I mentioned this in the contributor meeting as a topic to bring up on
> the
> > > list - should we take the opportunity to update defaults for Cassandra
> > 4.0?
> > >
> > > The rationale is two-fold:
> > > 1) There are best practices and tribal knowledge around certain
> > properties
> > > where people just know to update those properties immediately as a
> > starting
> > > point.  If it's pretty much a given that we set something as a starting
> > > point different than the current defaults, why not make that the new
> > > default?
> > > 2) We should align the defaults with what we test with.  There may be
> > > exceptions if we have one-off tests but on the whole, we should be
> > testing
> > > with defaults.
> > >
> > > As a starting point, compaction throughput and number of vnodes seem
> like
> > > good candidates but it would be great to get feedback for any others.
> > >
> > > For compaction throughput (
> > > https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a
> basic
> > > case on the ticket to default to 64 just as a starting point because
> the
> > > decision for 16 was made when spinning disk was most common.  Hence
> most
> > > people I know change that and I think without too much bikeshedding, 64
> > is
> > > a reasonable starting point.  A case could be made that empirically the
> > > compaction throughput throttle may have less effect than many people
> > think,
> > > but I still think an updated default would make sense.
> > >
> > > For number of vnodes, Michael Shuler made the point in the discussion
> > that
> > > we already test with 32, which is a far better number than the 256
> > > default.  I know many new users that just leave the 256 default and
> then
> > > discover later that it's better to go lower.  I think 32 is a good
> > > balance.  One could go lower with the new algorithm but I think 32 is
> > much
> > > better than 256 without being too skewed, and it's what we currently
> > test.
> > >
> > > Jeff brought up a good point that we want to be careful with defaults
> > > since changing them could come as an unpleasant surprise to people who
> > > don't explicitly set them.  As a general rule, we should always update
> > > release notes to clearly state that a default has changed.  For these
> two
> > > defaults in particular, I think it's safe.  For compaction throughput I
> > > think a release not is sufficient in case they want to modify it.  For
> > > number of vnodes, it won't affect existing deployments with data - it
> > would
> > > be for new clusters, which would honestly benefit from this anyway.
> > >
> > > The other point is whether it's too late to go into 4.0.  For these two
> > > changes, I think significant testing can still be done with these new
> > > defaults before release and I think testing more explicitly with 32
> > vnodes
> > > in particular will give people more confidence in the lower number
> with a
> > > wider array of testing (where we don't already use 32 explicitly).
> > >
> > > In summary, are people okay with considering updating these defaults
> and
> > > possibly others in the alpha stage of a new major release?  Are there
> > other
> > > properties to consider?
> > >
> > > Jeremy
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>

Reply via email to