[
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718936#comment-17718936
]
Maxim Muzafarov commented on CASSANDRA-17292:
---------------------------------------------
Hello [~maedhroz],
I'd like to offer a few thoughts on that as well. I have been delving into the
configuration usages while making the SettingsTable virtual table updatable and
I think we can support nested structured configuration (with some limitations)
as well as the flat one including storing the configuration in multiple files.
Why can we do this? Well, we should move away from using the configuration as a
POJO nested classes and store the configuration properties internally as a
tree-based runtime data structure (the same concept is provided by Apache
Commons Configuration, Lightbend Config etc.). This will give users a good deal
of flexibility, so they can use/split their configuration as they wish.
I also mentioned some thoughts here in the {{= Alternatives =}} section, with
all the drawbacks we might face:
[https://lists.apache.org/thread/gdtr3vp375d3nyj6h8xo7owth1s556lz]
h3. Why we can do so?
*The first thing* to note is that there is no need to map the yaml file
structure directly to the POJO configuration classes, as these classes are not
directly available to users and are only used in internal components. The only
requirement is that we must clearly define the configuration properties on
which the naming conversion is to be based: sub-components must be properly
prefixed (we can align properties using @Replaces annotation or mapping).
So a user can use any kind of configuration below, we just need to load the
configuration into our internal structure (or a POJO class) with an appropriate
YamlLoader.
This is valid:
{code:java}
commitlog_directory: String
commitlog_max_compression_buffers_in_pool: int
commitlog_periodic_queue_size: int
{code}
This is also valid:
{code:java}
commitlog:
directory: String
max_compression_buffers_in_pool: int
periodic_queue_size: int
{code}
This is a valid case if we split the configuration into multiple files and put
them in the classpath to load:
{code:java}
// Let's assume Cassandra configurations yaml has 'cassandra.(.*).yaml' pattern.
cassandra.accord.yaml
cassandra.yaml
{code}
*The second thing* to note is how the whole configuration can be validated. I
guess the answer here is relatively simple - we can reuse all the apply methods
we have now (applySSTableFormats(), applySimpleConfig(), applyPartitioner())
keeping them almost 'as is'.
*The third thing* is that if we use a runtime tree-based structure to configure
the Cassandra cluster, we are able to inject a configuration subtree right
where it is needed. For example, @Configuration(prefix="commiglog"), so there
will be no need to keep a layer with thousands of lines e.g. DatabaseDescriptor
class in the source code to access the configuration. Of course, we will keep
it to minimise the initial changes, but eventually, we can get rid of it.
{*}Last but not least{*}, we should think carefully about the performance of
accessing configuration fields, as this could affect the performance of the
cluster as a whole. Direct class field access is the fastest way we read a
property value, but I think in the Cassandra project it might be OK to have
O(1) guarantees. Some of the frameworks have configuration variables caching
under the hood. For example, the Netflix/archaius has this
[https://github.com/Netflix/archaius/blob/2.x/archaius2-core/src/main/java/com/netflix/archaius/DefaultPropertyFactory.java#L213],
but the commons configuration doesn't seem to. If we go this way we will have
to do benchmarks, but I think it will be faster enough within measurement error.
h3. Tree-based configuration frameworks
There are a lot of frameworks that store configuration in a runtime tree-based
structure that might be considered for Cassandra: [Apache Commons
Configuration|https://github.com/apache/commons-configuration], [Lightbend
Config|https://github.com/lightbend/config], [Netflix
Archaius|https://github.com/Netflix/archaius], and as I mentioned in the
{{=Alternatives=}} section, we can consider adding the Apache Commons
configuration. Adding something from 'apache commons' looks safer as we already
have some libraries from 'commons', rather than adding a completely different
configuration framework.
But whatever framework we consider, the following things need to be taken into
account:
- We have custom configuration datatypes such as DataStorageSpec,
DataStorageSpec;
- We have custom DurationSpec, so we either move them to Duration, preserving
backwards compatibility for all supported APIs (yaml, JMX), or extend a
considered framework with new types, we have to provide data type converters in
the latter case;
- An additional dependency, so the key component (configuration) of the
project becomes dependent on an external library version;
- We have to deal with configuration defaults calculated during initialisation
to maintain backward compatibility and preserve backward compatibility with the
current yaml-file structure;
In the end, I think we can prepare a POC with any frameworks we consider,
benchmark it and see how it goes. I can help with it as well.
> Move cassandra.yaml toward a nested structure around major database concepts
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Config
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new
> features") has made it clear we will gravitate toward appropriately nested
> structures for new parameters in {{cassandra.yaml}}, but from the scattered
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of
> this change include those we gain by doing it for new features (single point
> of interest for feature documentation, typed configuration objects, logical
> grouping for additional parameters added over time, discoverability, etc.),
> but on a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally,
> even a rough cut of a design here would allow that to move forward in a
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] -
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] -
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] -
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05 &
> https://gist.github.com/pauloricardomg/4369f4b0dd8b84421a11ae61bf2d2c7e
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]