[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

Paulo Motta (Jira) Tue, 22 Feb 2022 12:30:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496302#comment-17496302
 ]


Paulo Motta edited comment on CASSANDRA-17292 at 2/22/22, 8:29 PM:
-------------------------------------------------------------------

Thanks for the additional context [~maedhroz], that is very helpful to 
understand the reasoning behind the proposed nesting.

{quote}For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a Configuration 
container class, which would contain members w/ types like 
ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These 
would be easy to navigate, would provide reasonable points for inline 
documentation, could encapsulate validation logic for relationships between 
parameters within subsystems and features, and could be passed as little 
"kernels" of configuration around the codebase, allowing for better mocking, 
etc.
{quote}

I think we're not very far from what we want the end result to look like from 
the developer's perspective, my proposal is just a simplification of yours 
where instead of a multi-level hierarchy rooted on physical resources 
(cluster/network/storage), I'm proposing a feature-centric domain model 
hierachy with a single level - each feature define its own configuration 
subtree.

The basic construct to create new feature configurations is the following class:
{code:java}
public abstract class FeatureConfiguration
{
    // is the feature enabled by default?
    boolean enabled = false;
    
    // the feature name to be used in the YAML/JSON
    public abstract String getFeatureName();

    // whether this feature can be disabled
    public boolean isOptional()
    {
        return true;
    }
}
{code}
This would allow to easily create typed configuration for each feature:
 * CommitlogConfiguration
 * HintsConfiguration
 * MaterializedViewsConfiguration

For example this is how "HintsConfiguration" would look like:
{code:java}
public class HintsConfiguration extends FeatureConfiguration
{
   public HintsConfiguration()
   {
     this.enabled = true;
   } 

   public String getFeatureName()
   {
     return "hinted_handoff";
   }

   boolean auto_hints_cleanup = false
   Duration max_hint_window = "3h"
   Throttle hinted_handoff_throttle = "1024KiB"
   int max_hints_delivery_threads = 2
   Duration hints_flush_period = "10000ms"
   Size max_hints_file_size = "128MiB"
}
{code}

And would be represented as following on {{cassandra.yaml}}:

{code:yaml}
# Commit log (cannot be disabled because isOptional()=false)
commit_log:
  commitlog_sync: periodic
  commitlog_sync_period: 10000ms
  commitlog_segment_size: 32MiB

# Hinted Handoff
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 10000ms
  max_hints_file_size: 128MiB

# MVs are experimental and not recommended for production-use
materialized_views:
  enabled: false 
{code}

The approach above provides a very simple user experience while allowing typed 
configuration in the developer's side.

I think that we can easily fit most database configurations in this 
feature-centric view, but if there are some that we cannot fit into an existing 
feature we could create a new type {{ResourceConfiguration}} which would allow 
to configure a resource not tied to a particular feature.

{quote}I'm still pretty strongly in support of a versioned but intact single 
configuration file.
{quote}
Perhaps I should've made it clear but the split of configuration in multiple 
files is a mere optional convenience of my proposal, which also support 
configurations in a single file for backward-compatibility.

For instance, moving the configuration from the {{features.yaml}} to 
{{core.yaml}} would still render the same global configuration.

I think that the optional splitting of configuration in different files provide 
an organizational benefit of grouping together properties belonging to a 
similar category (ie. core-features which cannot be disabled, optional features 
and guardrails).

My original proposal of starting with 3 initial categories 
(core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the 
transition to the new configuration model:
 - cassandra.yaml (previously core.yaml): all legacy configurations would 
initially go here separated by section headers
 - features.yaml: all configurations compatible with the new 
{{{}FeatureConfiguration{}}}} model would go here (including new features and 
"migrated" legacy features)
 - guardrails.yaml: all guardrails are collocated in the same file for 
operational simplicity

For instance, the hints configuration is currently flat so it would initially 
go in {{cassandra.yaml}} in the old format:
{code:yaml}
hinted_handoff_enabled: true
max_hint_window: 3h
hinted_handoff_throttle: 1024KiB
max_hints_delivery_threads: 2
hints_flush_period: 10000ms
max_hints_file_size: 128MiB
auto_hints_cleanup_enabled: false
{code}

After someone incrementally decide's to "migrate" the hints configuration from 
the "legacy" format to the new format, it would remove the above entry from 
{{cassandra.yaml}} and add a new entry to {{{}features.yaml{}}}:

{code:yaml}
# even though this configuration is in features.yaml by default
# it will still be valid if we add it to core.yaml, cassandra.yaml
# or even foobar.yaml ;-)
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 10000ms
  max_hints_file_size: 128MiB
{code}

This model allow us to remain backward compatible by supporting legacy 
configurations on {{{}core.yaml{}}}//{{{}cassandra.yaml{}}} and migrating 
configurations incrementally to the new {{FeatureConfiguration}} format on the 
{{features.yaml}} file which would contain all the "modern" configuration.


was (Author: paulo):
Thanks for the additional context [~maedhroz], that is very helpful to 
understand the reasoning behind the proposed nesting.
{quote}For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a Configuration 
container class, which would contain members w/ types like 
ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These 
would be easy to navigate, would provide reasonable points for inline 
documentation, could encapsulate validation logic for relationships between 
parameters within subsystems and features, and could be passed as little 
"kernels" of configuration around the codebase, allowing for better mocking, 
etc.
{quote}
I think we're not very far from what we want the end result to look like from 
the developer's perspective, my proposal is just a simplification of yours 
where instead of a multi-level hierarchy rooted on physical resources 
(cluster/network/storage), I'm proposing a feature-centric domain model 
hierachy with a single level - each feature define its own configuration 
subtree.

The basic construct to create new feature configurations is the following class:
{code:java}
public abstract class FeatureConfiguration
{
    // is the feature enabled by default?
    boolean enabled = false;
    
    // the feature name to be used in the YAML/JSON
    public abstract String getFeatureName();

    // whether this feature can be disabled
    public boolean isOptional()
    {
        return true;
    }
}
{code}
This would allow to easily create typed configuration for each feature:
 * CommitlogConfiguration
 * HintsConfiguration
 * MaterializedViewsConfiguration

For example this is how "HintsConfiguration" would look like:
{code:java}
public class HintsConfiguration extends FeatureConfiguration
{
   public HintsConfiguration()
   {
     this.enabled = true;
   } 

   public String getFeatureName()
   {
     return "hinted_handoff";
   }

   boolean auto_hints_cleanup = false
   Duration max_hint_window = "3h"
   Throttle hinted_handoff_throttle = "1024KiB"
   int max_hints_delivery_threads = 2
   Duration hints_flush_period = "10000ms"
   Size max_hints_file_size = "128MiB"
}
{code}
And would be represented as following on {{{}cassandra.yaml{}}}:
{code:yaml}
# Commit log (cannot be disabled because isOptional()=false)
commit_log:
  commitlog_sync: periodic
  commitlog_sync_period: 10000ms
  commitlog_segment_size: 32MiB

# Hinted Handoff
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 10000ms
  max_hints_file_size: 128MiB

# MVs are experimental and not recommended for production-use
materialized_views:   enabled: false 
{code}
The approach above provides a very simple user experience while allowing typed 
configuration in the developer's side.

I think that we can easily fit most database configurations in this 
feature-centric view, but if there are some that we cannot fit into an existing 
feature we could create a new type {{ResourceConfiguration}} which would allow 
to configure a resource not tied to a particular feature.
{quote}I'm still pretty strongly in support of a versioned but intact single 
configuration file.
{quote}
Perhaps I should've made it clear but the split of configuration in multiple 
files is a mere optional convenience of my proposal, which also support 
configurations in a single file for backward-compatibility.

For instance, moving the configuration from the {{features.yaml}} to 
{{core.yaml}} would still render the same global configuration.

I think that the optional splitting of configuration in different files provide 
an organizational benefit of grouping together properties belonging to a 
similar category (ie. core-features which cannot be disabled, optional features 
and guardrails).

My original proposal of starting with 3 initial categories 
(core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the 
transition to the new configuration model:
 - cassandra.yaml (previously core.yaml): all legacy configurations would 
initially go here separated by section headers
 - features.yaml: all configurations compatible with the new 
{{{}FeatureConfiguration{}}}} model would go here (including new features and 
"migrated" legacy features)
 - guardrails.yaml: all guardrails are collocated in the same file for 
operational simplicity

For instance, the hints configuration is currently flat so it would initially 
go in {{cassandra.yaml}} in the old format:
{code:yaml}
hinted_handoff_enabled: true
max_hint_window: 3h
hinted_handoff_throttle: 1024KiB
max_hints_delivery_threads: 2
hints_flush_period: 10000ms
max_hints_file_size: 128MiB
auto_hints_cleanup_enabled: false
{code:yaml}
After someone incrementally decide's to "migrate" the hints configuration from 
the "legacy" format to the new format, it would remove the above entry from 
{{cassandra.yaml}} and add a new entry to {{{}features.yaml{}}}:
{code:yaml}
# even though this configuration is in features.yaml by default
# it will still be valid if we add it to core.yaml, cassandra.yaml
# or even foobar.yaml ;-)
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 10000ms
  max_hints_file_size: 128MiB
{code}
This model allow us to remain backward compatible by supporting legacy 
configurations on {{{}core.yaml{}}}//{{{}cassandra.yaml{}}} and migrating 
configurations incrementally to the new {{FeatureConfiguration}} format on the 
{{features.yaml}} file which would contain all the "modern" configuration.

> Move cassandra.yaml toward a nested structure around major database concepts
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

Reply via email to