I mentioned this in the contributor meeting as a topic to bring up on the list - should we take the opportunity to update defaults for Cassandra 4.0?
The rationale is two-fold: 1) There are best practices and tribal knowledge around certain properties where people just know to update those properties immediately as a starting point. If it's pretty much a given that we set something as a starting point different than the current defaults, why not make that the new default? 2) We should align the defaults with what we test with. There may be exceptions if we have one-off tests but on the whole, we should be testing with defaults. As a starting point, compaction throughput and number of vnodes seem like good candidates but it would be great to get feedback for any others. For compaction throughput (https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a basic case on the ticket to default to 64 just as a starting point because the decision for 16 was made when spinning disk was most common. Hence most people I know change that and I think without too much bikeshedding, 64 is a reasonable starting point. A case could be made that empirically the compaction throughput throttle may have less effect than many people think, but I still think an updated default would make sense. For number of vnodes, Michael Shuler made the point in the discussion that we already test with 32, which is a far better number than the 256 default. I know many new users that just leave the 256 default and then discover later that it's better to go lower. I think 32 is a good balance. One could go lower with the new algorithm but I think 32 is much better than 256 without being too skewed, and it's what we currently test. Jeff brought up a good point that we want to be careful with defaults since changing them could come as an unpleasant surprise to people who don't explicitly set them. As a general rule, we should always update release notes to clearly state that a default has changed. For these two defaults in particular, I think it's safe. For compaction throughput I think a release not is sufficient in case they want to modify it. For number of vnodes, it won't affect existing deployments with data - it would be for new clusters, which would honestly benefit from this anyway. The other point is whether it's too late to go into 4.0. For these two changes, I think significant testing can still be done with these new defaults before release and I think testing more explicitly with 32 vnodes in particular will give people more confidence in the lower number with a wider array of testing (where we don't already use 32 explicitly). In summary, are people okay with considering updating these defaults and possibly others in the alpha stage of a new major release? Are there other properties to consider? Jeremy --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org