Re: [DISCUSS] Nested YAML configs for new features

Ekaterina Dimitrova Tue, 30 Nov 2021 06:09:34 -0800

Thank you for confirming as I misread your email at first :-)
I had a chat with David last week and I don’t think his plan is reworking
of 15234 but incremental improvements on top of it.
Regarding config, after spending time cleaning around and looking more into
detail my only appeal is:
- Centralized management and not 5 places to change things when you add new
config so we are less error-prone
- Documenting things for people who add new config or for our users (I
promised and I will do it for 15234 but it will be good to continue doing
 it with any further changes down the road)
- be careful with breaking changes


Thank you
Ekaterina

On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org>
wrote:

> I mean that it has been waiting for months, is ready to go, and I don’t
> want to hold you up any longer.
>
> From: Ekaterina Dimitrova <e.dimitr...@gmail.com>
> Date: Tuesday, 30 November 2021 at 13:44
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> “
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work. “
>
>
>  Sailed? I submitted the patch a week ago for review. Not sure how to
> understand this statement. Can elaborate, please?
>
> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org>
> wrote:
>
> > The problem with scoping this to “features” is that we end up with at
> best
> > local coherence. The config file as a whole will end up just as
> incoherent
> > through its design evolution as it has historically.
> >
> > If you take a look at my proposed layout for the overall config, there is
> > a “limits” section that specifies thresholds for reporting warnings and
> > errors for various scenario. In this case, we probably don’t also want
> > per-feature limits? We’re also mixing terminology already, with
> > limits/thresholds and fail/abort.
> >
> > It’s a lot of work to come up with a coherent and intuitive config
> layout.
> > We probably want to at least create some documentation in-tree
> stipulating
> > terminology with respect to plurals, verbs/nouns, and specific terms
> > (period, abort, limit, datacenter vs dc, etc), but ideally we would have
> a
> > common end goal for the config file.
> >
> > > leave non-features to CASSANDRA-15234
> >
> > IMO 15234 has sailed – it’s been held up for a long time, and was brought
> > to this list for discussion with no engagement. Ekaterina is long overdue
> > being able to commit her work.
> >
> >
> > From: David Capwell <dcapw...@apple.com.INVALID>
> > Date: Monday, 29 November 2021 at 23:44
> > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >  but I would hate to repeat the mistakes of our past by evolving the
> > config in a new direction without any coherent overarching design.
> >
> > At the start I asked to keep the thread local to new features, but to
> more
> > flesh out an “overarching design” maybe we should increase the “desired”
> > scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> > Standardise config and JVM parameters)?  Aka, do we think the following
> is
> > more ideal (configs scoped to a feature)
> >
> > hinted_handoff:
> >   enabled: true
> >   disabled_datacenters:
> >     - DC1
> >     - DC2
> >   max_window: 3h
> >   flush_period: 10s
> >   max_file_size: 128mb
> >   compression:
> >     class_name: LZ4Compressor
> >     parameters:
> >       a: b
> >
> > track_warnings:
> >   enabled: true
> >   local_read_size:
> >     warn_threshold: 1mb
> >     abort_threshold: 10mb
> >   coordinator_read_size:
> >     warn_threshold: 5mb
> >     abort_threshold: 20mb
> >
> >
> > OR
> >
> > # I had to rename hint configs as there was 0 consistent naming
> > hinted_handoff_enabled: true
> > hinted_handoff_disabled_datacenters:
> >   - 'DC1'
> >   - 'DC2'
> > hinted_handoff_max_window: 3h
> > hinted_handoff_max_file_size: 128mb
> > hinted_handoff_flush_period: 10s
> > hinted_handoff_compression:
> >   class_name: LZ4Compressor
> >   parameters:
> >     a: b
> >
> > track_warnings_enabled: true
> > track_warnings_local_read_size_warn_threshold: 1mb
> > track_warnings_local_read_size_abort_threshold: 10mb
> > track_warnings_coordinator_read_size_warn_threshold: 5mb
> > track_warnings_coordinator_read_size_abort_threshold: 20mb
> >
> >
> > The main issue I have with flat structure is that we have no way to
> > enforce standard naming; if you look at the hint example there were at
> > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can
> we
> > actually maintain that?).  And one of the core reasons track_warnings
> went
> > nested was that warn/abort some times became warn/fail and threshold some
> > times was thresholds…. By embracing nested structure we can actually
> > enforce consistency, with flat we have no way to maintain consistency.
> >
> > Additionally by embracing the nested structure we can accept a flat one
> as
> > well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> > get the consistency of nested, and the “grep” benefits of flat.
> >
> >
> > > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> > >
> > > If we’re thinking of moving towards nested configuration, then before
> > employing the approach further we would ideally consider what a fully
> > nested config looks like for the project. Ekaterina has done a lot to
> clean
> > up inconsistent naming, but I would hate to repeat the mistakes of our
> past
> > by evolving the config in a new direction without any coherent
> overarching
> > design.
> > >
> > > In case anyone missed it in the earlier discussion, this was my attempt
> > to prototype a nested config:
> >
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> > >
> > > I don’t have any specific attachment to it, but settling on some
> > approximate scheme would be helpful IMO.
> > >
> > > From: David Capwell <dcapw...@apple.com.INVALID>
> > > Date: Monday, 29 November 2021 at 20:38
> > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >> What should our default example cassandra.yaml file use (flat or
> > nested)?  Currently default shows nested
> > >
> > > Was told this statement was confusing, so trying to clarify.  At the
> > moment we do not allow a nested config to be expressed in any way outside
> > of nesting it (excluding YAML’s ability to inline objects), so if we did
> > allow flat config representation of nested configs, then this would be a
> > brand new feature; we currently show the nested structure in
> cassandra.yaml
> > >
> > >> On Nov 29, 2021, at 11:58 AM, David Capwell
> <dcapw...@apple.com.INVALID>
> > wrote:
> > >>
> > >> Thanks everyone for the comments, I hope below is a good summary of
> all
> > the talking points?
> > >>
> > >> We already use nested configs (networking, seed provider, commit
> > log/hint compression, back pressure, etc.)
> > >> Flat configs are easier for grep, but can be solved with grep -A/-B
> > and/or yq
> > >> It would be possible to support flat versions of our configs in
> > cassandra.yaml (in addition to the nested versions)
> > >> "Settings" vtable currently uses the "_" separator (example of
> > encryption/audit log).  Switching to "." Would be a change in behavior
> > which may impact some users
> > >> "." Separator for nested configs are common in other systems (yq,
> > elastic search, etc.)
> > >> "Structured / nested config is easier for human eyes to read"... "Flat
> > config is harder for human eyes but easy for simple scripts"
> > >> For learning what configs are enabled, cassandra.yaml isn't the best
> > interface as it may not reflect the actual configs; we can better expose
> > this in CQL and/or Sidecar
> > >> What should our default example cassandra.yaml file use (flat or
> > nested)?  Currently default shows nested
> > >> When projecting the Config into CQL, we may want to consider UDTs to
> > represent the complex types
> > >> Current limitations in CQL make nested structures hard to work with,
> it
> > may be worth wild to expand CQL support for nested structures.
> > >>
> > >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> > be reusable outside of yaml parsing, 2) support setters (we currently do,
> > but setters must be snake case… I fixed that)…, 3) support both nested
> and
> > structured, 4) support ignoring fields in a consistent way (Settings
> vtable
> > will include things SnakeYAML won’t and visa-versa).
> > >>
> > >> https://github.com/apache/cassandra/pull/1335 <
> > https://github.com/apache/cassandra/pull/1335><
> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a final
> > ready to merge thing, but instead a POC to show how we can solve a lot of
> > the core problems in a consistent and reusable manner.
> > >>
> > >> The following cassandra.yaml was used to show both worlds would work
> > fine in the config (and compliment each other)
> > >>
> > >> track_warnings:
> > >> enabled: true
> > >> # nested relative to the local level (TrackWarnings)
> > >> coordinator_read_size.warn_threshold_kb: 1024
> > >> local_read_size.abort_threshold_kb: 1024
> > >> row_index_size:
> > >>   warn_threshold_kb: 1024
> > >>   abort_threshold_kb: 1024
> > >> # nested relative to the top level
> > >> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> > >>
> > >> For the “Settings” vtable, a new Loader interface was added to get all
> > the properties, and Properties.flatten would turn every property into a
> > “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> > isCollection).  This doesn’t solve 100% of the issues that vtable has
> > (types such as Duration would need additional translation as they are
> > Scalar but need a translation from String -> Duration), and doesn’t solve
> > the fact the table currently uses “_”.
> > >>
> > >>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
> > >>>
> > >>> I meant to imply we should improve our UDT usability to support this
> > kind of querying, essentially – but that if we support a simple
> > text->property setup we might want to offer LIKE support so we can search
> > them (via simple filtering, not any index) – which is actually pretty
> easy
> > to provide.
> > >>>
> > >>> I think we should aim to provide users all the facilities they need
> to
> > interact with config via vtables. If the user requires external tooling,
> it
> > suggests a weakness in CQL that we should address, and maybe help the
> user
> > in other scenario too…
> > >>>
> > >>> From: Joseph Lynch <joe.e.ly...@gmail.com>
> > >>> Date: Monday, 29 November 2021 at 17:32
> > >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
> > >>> <bened...@apache.org> wrote:
> > >>>>
> > >>>> Maybe we can make our query language more expressive 😊
> > >>>>
> > >>>> We might anyway want to introduce e.g. a LIKE filtering option to
> > find/discover flattened config parameters?
> > >>>
> > >>> This sounds more complicated than just having the settings virtual
> > >>> table return text (dot encoded) -> text (json) and probably not even
> > >>> that much more useful. A full table scan on the settings table could
> > >>> return all top level keys (strings before the first dot) and if we
> > >>> just return a valid json string then users can bring their own
> > >>> querying capabilities via jq [1], or one line of code in almost any
> > >>> programming language (especially python, perl, etc ...).
> > >>>
> > >>> Alternatively if we want to modify the grammar it seems supporting
> > >>> structured data querying on text fields would maybe be more
> preferable
> > >>> to LIKE since you could get what you want without a grammar change
> and
> > >>> if we could generalize to any text column it would be amazingly
> useful
> > >>> elsewhere to users. For example, we could emulate jq's query syntax
> in
> > >>> the select which is, imo, best-in-class for quickly querying into
> > >>> nearest structures. Assuming a key (text) -> value (json) schema:
> > >>>
> > >>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> > >>>
> > >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> > >>>
> > >>> To have exactly jq syntax (but harder to parse) it would be:
> > >>>
> > >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> > >>>
> > >>> Since we're not indexing the structured data in any way, filtering
> > >>> before selection probably doesn't give us much performance
> improvement
> > >>> as we'd still have to parse the whole text field in most cases.
> > >>>
> > >>> -Joey
> > >>>
> > >>> [1] https://stedolan.github.io/jq/
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to