I mentioned this in the contributor meeting as a topic to bring up on the list 
- should we take the opportunity to update defaults for Cassandra 4.0?

The rationale is two-fold:
1) There are best practices and tribal knowledge around certain properties 
where people just know to update those properties immediately as a starting 
point.  If it's pretty much a given that we set something as a starting point 
different than the current defaults, why not make that the new default?
2) We should align the defaults with what we test with.  There may be 
exceptions if we have one-off tests but on the whole, we should be testing with 
defaults.

As a starting point, compaction throughput and number of vnodes seem like good 
candidates but it would be great to get feedback for any others.

For compaction throughput 
(https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a basic case 
on the ticket to default to 64 just as a starting point because the decision 
for 16 was made when spinning disk was most common.  Hence most people I know 
change that and I think without too much bikeshedding, 64 is a reasonable 
starting point.  A case could be made that empirically the compaction 
throughput throttle may have less effect than many people think, but I still 
think an updated default would make sense.

For number of vnodes, Michael Shuler made the point in the discussion that we 
already test with 32, which is a far better number than the 256 default.  I 
know many new users that just leave the 256 default and then discover later 
that it's better to go lower.  I think 32 is a good balance.  One could go 
lower with the new algorithm but I think 32 is much better than 256 without 
being too skewed, and it's what we currently test.

Jeff brought up a good point that we want to be careful with defaults since 
changing them could come as an unpleasant surprise to people who don't 
explicitly set them.  As a general rule, we should always update release notes 
to clearly state that a default has changed.  For these two defaults in 
particular, I think it's safe.  For compaction throughput I think a release not 
is sufficient in case they want to modify it.  For number of vnodes, it won't 
affect existing deployments with data - it would be for new clusters, which 
would honestly benefit from this anyway.

The other point is whether it's too late to go into 4.0.  For these two 
changes, I think significant testing can still be done with these new defaults 
before release and I think testing more explicitly with 32 vnodes in particular 
will give people more confidence in the lower number with a wider array of 
testing (where we don't already use 32 explicitly).

In summary, are people okay with considering updating these defaults and 
possibly others in the alpha stage of a new major release?  Are there other 
properties to consider?

Jeremy
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to