Re: [DISCUSS] Nested YAML configs for new features

Ekaterina Dimitrova Tue, 30 Nov 2021 05:36:59 -0800

“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “



 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, [email protected] <[email protected]>
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell <[email protected]>
> Date: Monday, 29 November 2021 at 23:44
> To: [email protected] <[email protected]>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
>     - DC1
>     - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
>     class_name: LZ4Compressor
>     parameters:
>       a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
>     warn_threshold: 1mb
>     abort_threshold: 10mb
>   coordinator_read_size:
>     warn_threshold: 5mb
>     abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
>     a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, [email protected] wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific attachment to it, but settling on some
> approximate scheme would be helpful IMO.
> >
> > From: David Capwell <[email protected]>
> > Date: Monday, 29 November 2021 at 20:38
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >
> > Was told this statement was confusing, so trying to clarify.  At the
> moment we do not allow a nested config to be expressed in any way outside
> of nesting it (excluding YAML’s ability to inline objects), so if we did
> allow flat config representation of nested configs, then this would be a
> brand new feature; we currently show the nested structure in cassandra.yaml
> >
> >> On Nov 29, 2021, at 11:58 AM, David Capwell <[email protected]>
> wrote:
> >>
> >> Thanks everyone for the comments, I hope below is a good summary of all
> the talking points?
> >>
> >> We already use nested configs (networking, seed provider, commit
> log/hint compression, back pressure, etc.)
> >> Flat configs are easier for grep, but can be solved with grep -A/-B
> and/or yq
> >> It would be possible to support flat versions of our configs in
> cassandra.yaml (in addition to the nested versions)
> >> "Settings" vtable currently uses the "_" separator (example of
> encryption/audit log).  Switching to "." Would be a change in behavior
> which may impact some users
> >> "." Separator for nested configs are common in other systems (yq,
> elastic search, etc.)
> >> "Structured / nested config is easier for human eyes to read"... "Flat
> config is harder for human eyes but easy for simple scripts"
> >> For learning what configs are enabled, cassandra.yaml isn't the best
> interface as it may not reflect the actual configs; we can better expose
> this in CQL and/or Sidecar
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >> When projecting the Config into CQL, we may want to consider UDTs to
> represent the complex types
> >> Current limitations in CQL make nested structures hard to work with, it
> may be worth wild to expand CQL support for nested structures.
> >>
> >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> be reusable outside of yaml parsing, 2) support setters (we currently do,
> but setters must be snake case… I fixed that)…, 3) support both nested and
> structured, 4) support ignoring fields in a consistent way (Settings vtable
> will include things SnakeYAML won’t and visa-versa).
> >>
> >> https://github.com/apache/cassandra/pull/1335 <
> https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final
> ready to merge thing, but instead a POC to show how we can solve a lot of
> the core problems in a consistent and reusable manner.
> >>
> >> The following cassandra.yaml was used to show both worlds would work
> fine in the config (and compliment each other)
> >>
> >> track_warnings:
> >> enabled: true
> >> # nested relative to the local level (TrackWarnings)
> >> coordinator_read_size.warn_threshold_kb: 1024
> >> local_read_size.abort_threshold_kb: 1024
> >> row_index_size:
> >>   warn_threshold_kb: 1024
> >>   abort_threshold_kb: 1024
> >> # nested relative to the top level
> >> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> >>
> >> For the “Settings” vtable, a new Loader interface was added to get all
> the properties, and Properties.flatten would turn every property into a
> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> isCollection).  This doesn’t solve 100% of the issues that vtable has
> (types such as Duration would need additional translation as they are
> Scalar but need a translation from String -> Duration), and doesn’t solve
> the fact the table currently uses “_”.
> >>
> >>> On Nov 29, 2021, at 10:11 AM, [email protected] wrote:
> >>>
> >>> I meant to imply we should improve our UDT usability to support this
> kind of querying, essentially – but that if we support a simple
> text->property setup we might want to offer LIKE support so we can search
> them (via simple filtering, not any index) – which is actually pretty easy
> to provide.
> >>>
> >>> I think we should aim to provide users all the facilities they need to
> interact with config via vtables. If the user requires external tooling, it
> suggests a weakness in CQL that we should address, and maybe help the user
> in other scenario too…
> >>>
> >>> From: Joseph Lynch <[email protected]>
> >>> Date: Monday, 29 November 2021 at 17:32
> >>> To: [email protected] <[email protected]>
> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>> On Mon, Nov 29, 2021 at 11:51 AM [email protected]
> >>> <[email protected]> wrote:
> >>>>
> >>>> Maybe we can make our query language more expressive 😊
> >>>>
> >>>> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
> >>>
> >>> This sounds more complicated than just having the settings virtual
> >>> table return text (dot encoded) -> text (json) and probably not even
> >>> that much more useful. A full table scan on the settings table could
> >>> return all top level keys (strings before the first dot) and if we
> >>> just return a valid json string then users can bring their own
> >>> querying capabilities via jq [1], or one line of code in almost any
> >>> programming language (especially python, perl, etc ...).
> >>>
> >>> Alternatively if we want to modify the grammar it seems supporting
> >>> structured data querying on text fields would maybe be more preferable
> >>> to LIKE since you could get what you want without a grammar change and
> >>> if we could generalize to any text column it would be amazingly useful
> >>> elsewhere to users. For example, we could emulate jq's query syntax in
> >>> the select which is, imo, best-in-class for quickly querying into
> >>> nearest structures. Assuming a key (text) -> value (json) schema:
> >>>
> >>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> >>>
> >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> >>>
> >>> To have exactly jq syntax (but harder to parse) it would be:
> >>>
> >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> >>>
> >>> Since we're not indexing the structured data in any way, filtering
> >>> before selection probably doesn't give us much performance improvement
> >>> as we'd still have to parse the whole text field in most cases.
> >>>
> >>> -Joey
> >>>
> >>> [1] https://stedolan.github.io/jq/
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to