[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

Maxim Muzafarov (Jira) Wed, 03 May 2023 06:32:37 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718936#comment-17718936
 ]


Maxim Muzafarov commented on CASSANDRA-17292:
---------------------------------------------

Hello [~maedhroz],



I'd like to offer a few thoughts on that as well. I have been delving into the 
configuration usages while making the SettingsTable virtual table updatable and 
I think we can support nested structured configuration (with some limitations) 
as well as the flat one including storing the configuration in multiple files. 
Why can we do this? Well, we should move away from using the configuration as a 
POJO nested classes and store the configuration properties internally as a 
tree-based runtime data structure (the same concept is provided by Apache 
Commons Configuration, Lightbend Config etc.). This will give users a good deal 
of flexibility, so they can use/split their configuration as they wish.

I also mentioned some thoughts here in the {{= Alternatives =}} section, with 
all the drawbacks we might face:
[https://lists.apache.org/thread/gdtr3vp375d3nyj6h8xo7owth1s556lz]
h3. Why we can do so?

*The first thing* to note is that there is no need to map the yaml file 
structure directly to the POJO configuration classes, as these classes are not 
directly available to users and are only used in internal components. The only 
requirement is that we must clearly define the configuration properties on 
which the naming conversion is to be based: sub-components must be properly 
prefixed (we can align properties using @Replaces annotation or mapping).

So a user can use any kind of configuration below, we just need to load the 
configuration into our internal structure (or a POJO class) with an appropriate 
YamlLoader.

This is valid:
{code:java}
commitlog_directory: String
commitlog_max_compression_buffers_in_pool: int
commitlog_periodic_queue_size: int
{code}
This is also valid:
{code:java}
commitlog:
  directory: String
  max_compression_buffers_in_pool: int
  periodic_queue_size: int
{code}
This is a valid case if we split the configuration into multiple files and put 
them in the classpath to load:
{code:java}
// Let's assume Cassandra configurations yaml has 'cassandra.(.*).yaml' pattern.
cassandra.accord.yaml
cassandra.yaml
{code}
*The second thing* to note is how the whole configuration can be validated. I 
guess the answer here is relatively simple - we can reuse all the apply methods 
we have now (applySSTableFormats(), applySimpleConfig(), applyPartitioner()) 
keeping them almost 'as is'.

*The third thing* is that if we use a runtime tree-based structure to configure 
the Cassandra cluster, we are able to inject a configuration subtree right 
where it is needed. For example, @Configuration(prefix="commiglog"), so there 
will be no need to keep a layer with thousands of lines e.g. DatabaseDescriptor 
class in the source code to access the configuration. Of course, we will keep 
it to minimise the initial changes, but eventually, we can get rid of it.

{*}Last but not least{*}, we should think carefully about the performance of 
accessing configuration fields, as this could affect the performance of the 
cluster as a whole. Direct class field access is the fastest way we read a 
property value, but I think in the Cassandra project it might be OK to have 
O(1) guarantees. Some of the frameworks have configuration variables caching 
under the hood. For example, the Netflix/archaius has this 
[https://github.com/Netflix/archaius/blob/2.x/archaius2-core/src/main/java/com/netflix/archaius/DefaultPropertyFactory.java#L213],
 but the commons configuration doesn't seem to. If we go this way we will have 
to do benchmarks, but I think it will be faster enough within measurement error.
h3. Tree-based configuration frameworks

There are a lot of frameworks that store configuration in a runtime tree-based 
structure that might be considered for Cassandra: [Apache Commons 
Configuration|https://github.com/apache/commons-configuration], [Lightbend 
Config|https://github.com/lightbend/config], [Netflix 
Archaius|https://github.com/Netflix/archaius], and as I mentioned in the 
{{=Alternatives=}} section, we can consider adding the Apache Commons 
configuration. Adding something from 'apache commons' looks safer as we already 
have some libraries from 'commons', rather than adding a completely different 
configuration framework.

But whatever framework we consider, the following things need to be taken into 
account:
 - We have custom configuration datatypes such as DataStorageSpec, 
DataStorageSpec;
 - We have custom DurationSpec, so we either move them to Duration, preserving 
backwards compatibility for all supported APIs (yaml, JMX), or extend a 
considered framework with new types, we have to provide data type converters in 
the latter case;
 - An additional dependency, so the key component (configuration) of the 
project becomes dependent on an external library version;
 - We have to deal with configuration defaults calculated during initialisation 
to maintain backward compatibility and preserve backward compatibility with the 
current yaml-file structure;

In the end, I think we can prepare a POC with any frameworks we consider, 
benchmark it and see how it goes. I can help with it as well.

> Move cassandra.yaml toward a nested structure around major database concepts
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but on a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05 & 
> https://gist.github.com/pauloricardomg/4369f4b0dd8b84421a11ae61bf2d2c7e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

Reply via email to