Thanks for bringing this to the list Ekaterina!

It’s worth noting that the two don’t have to be in conflict: we could offer two 
template yaml with the parameters grouped differently, for users to decide for 
themselves.

The proposals primarily define parameter names differently, with my proposal 
going by kind->place, and the other proposal maintaining (mostly) the existing 
name form (which is a bit more like place->kind). While the example yaml groups 
by kind, you can convert nested definitions into a ‘dot’ form (e.g. 
limits.concurrency.reads) for use in a different grouping.

One advantage of grouping parameters together is that it aids maintaining 
coherency of naming between systems, and also potentially permits a more 
succinct config file and better discovery. But it’s far from a silver bullet, 
as value judgements have to be made about where the grouping lines are. I’m 
sure anything we settle on will be a huge improvement over the status quo, 
however.




From: Ekaterina Dimitrova <e.dimitr...@gmail.com>
Date: Thursday, 2 September 2021 at 16:32
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: [DISCUSS] CASSANDRA-15234
Hi team,

I would like to bring to the attention of the community CASSANDRA-15234,
standardise config and JVM parameters.

This is work we discussed back in Summer 2020 just before our first 4.0
Beta release. During the discussion we figured out that there is more than
one option to do the job and not enough time to get user feedback and
finish it so this was delayed post-4.0 And here I am, bringing it back to
the table.

This work’s goal is:

   -

   To standardize naming - that we did by agreeing to the form noun_verb
   -

   Provision of values with units while maintaining backward compatibility.


Those two parts are more or less already done.

More interesting is the third part - reorganizing the cassandra.yaml file.

My personal approach was to split it into sections, done here
<https://github.com/ekaterinadimitrova2/cassandra/blob/b4eebe080835da79d032f9314262c268b71172a8/conf/cassandra.yaml>
.

Another proposal is done by Benedict; grouping the config parameters.

To make it clearer, he created a yaml
<https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml>
with comments mostly stripped.

In his version, there are basic settings for network, disk etc all grouped
together, followed by operator tuneables mostly under limits within which
we now have throughput, concurrency, capacity. This leads to settings for
some features being kept separate (most notably for caching), but helps the
operator understand what they have to play with for controlling resource
consumption.

I am interested to hear what people think about the two options or if
anyone has another idea to share, open discussion.

Thank you,

Ekaterina

Reply via email to