Please find my comments inline below

* Good with nested configs - the question is how they will be introduced
and maintained? I wouldn't advocate for maintaining more than yaml file but
probably as you once mentioned some time ago (if I remember correctly) -
having one format as default and just documenting the support for the other
one/ones. Now which is the default one is a different topic.
* Where/How we group is an open question, maybe we move this to a JIRA as
follow up work to CASSANDRA-15234? - not part of CASSANDRA-15234 as per all
the discussions, already in review (thank you for your first quick round
btw, appreciate it!)

Spoke with Ekaterina about this, and not solved in 15234; let's move this
to a follow up JIRA for 15234? - For the broader audience, currently what I
solve around naming in CASSANDRA-15234 is removing the unit suffix and
moving to the format noun_verb the config parameters names. After all
discussions and realizing the great interest and variety of opinions, I
tried really to split more tickets from CASSANDRA-15234 and to keep
primarily the new custom types and the new framework with backward
compatibility as the main body of work, good also for the reviewers. Last
year I came up with the idea of reorganizing the config file a bit which
led to discussions. So considering my previous point about splitting to a
more incremental approach considering the variety of opinions, I suggested
when submitting for review to open a new ticket for that new organization
of our config file. Probably we can add the abort/fail and any other
similar concerns/questions there post CASSANDRA-15234?


On Fri, 3 Dec 2021 at 13:34, David Capwell <dcapw...@apple.com.invalid>
wrote:

> Thanks everyone for the feedback!  If I am reading this properly I am
> seeing the following
>
> * Good with nested configs
> * Good with YAML layer supporting flat structure (possible foo.bar.baz for
> the path foo: {bar: {baz: 42}}), how this relates with Settings table
> should be resolved, but there is a open ticket for this (enhance our YAML
> CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
> * Where/How we group is an open question, maybe we move this to a JIRA as
> follow up work to CASSANDRA-15234?
>
> > We’re also mixing terminology already, with limits/thresholds and
> fail/abort.
>
> Spoke with Ekaterina about this, and not solved in 15234; lets move this
> to a follow up JIRA for 15234?
>
> > On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e.dimitr...@gmail.com>
> wrote:
> >
> > Thank you for confirming as I misread your email at first :-)
> > I had a chat with David last week and I don’t think his plan is reworking
> > of 15234 but incremental improvements on top of it.
> > Regarding config, after spending time cleaning around and looking more
> into
> > detail my only appeal is:
> > - Centralized management and not 5 places to change things when you add
> new
> > config so we are less error-prone
> > - Documenting things for people who add new config or for our users (I
> > promised and I will do it for 15234 but it will be good to continue doing
> > it with any further changes down the road)
> > - be careful with breaking changes
> >
> > Thank you
> > Ekaterina
> >
> > On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org>
> > wrote:
> >
> >> I mean that it has been waiting for months, is ready to go, and I don’t
> >> want to hold you up any longer.
> >>
> >> From: Ekaterina Dimitrova <e.dimitr...@gmail.com>
> >> Date: Tuesday, 30 November 2021 at 13:44
> >> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> “
> >> IMO 15234 has sailed – it’s been held up for a long time, and was
> brought
> >> to this list for discussion with no engagement. Ekaterina is long
> overdue
> >> being able to commit her work. “
> >>
> >>
> >> Sailed? I submitted the patch a week ago for review. Not sure how to
> >> understand this statement. Can elaborate, please?
> >>
> >> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org>
> >> wrote:
> >>
> >>> The problem with scoping this to “features” is that we end up with at
> >> best
> >>> local coherence. The config file as a whole will end up just as
> >> incoherent
> >>> through its design evolution as it has historically.
> >>>
> >>> If you take a look at my proposed layout for the overall config, there
> is
> >>> a “limits” section that specifies thresholds for reporting warnings and
> >>> errors for various scenario. In this case, we probably don’t also want
> >>> per-feature limits? We’re also mixing terminology already, with
> >>> limits/thresholds and fail/abort.
> >>>
> >>> It’s a lot of work to come up with a coherent and intuitive config
> >> layout.
> >>> We probably want to at least create some documentation in-tree
> >> stipulating
> >>> terminology with respect to plurals, verbs/nouns, and specific terms
> >>> (period, abort, limit, datacenter vs dc, etc), but ideally we would
> have
> >> a
> >>> common end goal for the config file.
> >>>
> >>>> leave non-features to CASSANDRA-15234
> >>>
> >>> IMO 15234 has sailed – it’s been held up for a long time, and was
> brought
> >>> to this list for discussion with no engagement. Ekaterina is long
> overdue
> >>> being able to commit her work.
> >>>
> >>>
> >>> From: David Capwell <dcapw...@apple.com.INVALID>
> >>> Date: Monday, 29 November 2021 at 23:44
> >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>> but I would hate to repeat the mistakes of our past by evolving the
> >>> config in a new direction without any coherent overarching design.
> >>>
> >>> At the start I asked to keep the thread local to new features, but to
> >> more
> >>> flesh out an “overarching design” maybe we should increase the
> “desired”
> >>> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> >>> Standardise config and JVM parameters)?  Aka, do we think the following
> >> is
> >>> more ideal (configs scoped to a feature)
> >>>
> >>> hinted_handoff:
> >>>  enabled: true
> >>>  disabled_datacenters:
> >>>    - DC1
> >>>    - DC2
> >>>  max_window: 3h
> >>>  flush_period: 10s
> >>>  max_file_size: 128mb
> >>>  compression:
> >>>    class_name: LZ4Compressor
> >>>    parameters:
> >>>      a: b
> >>>
> >>> track_warnings:
> >>>  enabled: true
> >>>  local_read_size:
> >>>    warn_threshold: 1mb
> >>>    abort_threshold: 10mb
> >>>  coordinator_read_size:
> >>>    warn_threshold: 5mb
> >>>    abort_threshold: 20mb
> >>>
> >>>
> >>> OR
> >>>
> >>> # I had to rename hint configs as there was 0 consistent naming
> >>> hinted_handoff_enabled: true
> >>> hinted_handoff_disabled_datacenters:
> >>>  - 'DC1'
> >>>  - 'DC2'
> >>> hinted_handoff_max_window: 3h
> >>> hinted_handoff_max_file_size: 128mb
> >>> hinted_handoff_flush_period: 10s
> >>> hinted_handoff_compression:
> >>>  class_name: LZ4Compressor
> >>>  parameters:
> >>>    a: b
> >>>
> >>> track_warnings_enabled: true
> >>> track_warnings_local_read_size_warn_threshold: 1mb
> >>> track_warnings_local_read_size_abort_threshold: 10mb
> >>> track_warnings_coordinator_read_size_warn_threshold: 5mb
> >>> track_warnings_coordinator_read_size_abort_threshold: 20mb
> >>>
> >>>
> >>> The main issue I have with flat structure is that we have no way to
> >>> enforce standard naming; if you look at the hint example there were at
> >>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but
> can
> >> we
> >>> actually maintain that?).  And one of the core reasons track_warnings
> >> went
> >>> nested was that warn/abort some times became warn/fail and threshold
> some
> >>> times was thresholds…. By embracing nested structure we can actually
> >>> enforce consistency, with flat we have no way to maintain consistency.
> >>>
> >>> Additionally by embracing the nested structure we can accept a flat one
> >> as
> >>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so
> we
> >>> get the consistency of nested, and the “grep” benefits of flat.
> >>>
> >>>
> >>>> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> >>>>
> >>>> If we’re thinking of moving towards nested configuration, then before
> >>> employing the approach further we would ideally consider what a fully
> >>> nested config looks like for the project. Ekaterina has done a lot to
> >> clean
> >>> up inconsistent naming, but I would hate to repeat the mistakes of our
> >> past
> >>> by evolving the config in a new direction without any coherent
> >> overarching
> >>> design.
> >>>>
> >>>> In case anyone missed it in the earlier discussion, this was my
> attempt
> >>> to prototype a nested config:
> >>>
> >>
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >>>>
> >>>> I don’t have any specific attachment to it, but settling on some
> >>> approximate scheme would be helpful IMO.
> >>>>
> >>>> From: David Capwell <dcapw...@apple.com.INVALID>
> >>>> Date: Monday, 29 November 2021 at 20:38
> >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>>> What should our default example cassandra.yaml file use (flat or
> >>> nested)?  Currently default shows nested
> >>>>
> >>>> Was told this statement was confusing, so trying to clarify.  At the
> >>> moment we do not allow a nested config to be expressed in any way
> outside
> >>> of nesting it (excluding YAML’s ability to inline objects), so if we
> did
> >>> allow flat config representation of nested configs, then this would be
> a
> >>> brand new feature; we currently show the nested structure in
> >> cassandra.yaml
> >>>>
> >>>>> On Nov 29, 2021, at 11:58 AM, David Capwell
> >> <dcapw...@apple.com.INVALID>
> >>> wrote:
> >>>>>
> >>>>> Thanks everyone for the comments, I hope below is a good summary of
> >> all
> >>> the talking points?
> >>>>>
> >>>>> We already use nested configs (networking, seed provider, commit
> >>> log/hint compression, back pressure, etc.)
> >>>>> Flat configs are easier for grep, but can be solved with grep -A/-B
> >>> and/or yq
> >>>>> It would be possible to support flat versions of our configs in
> >>> cassandra.yaml (in addition to the nested versions)
> >>>>> "Settings" vtable currently uses the "_" separator (example of
> >>> encryption/audit log).  Switching to "." Would be a change in behavior
> >>> which may impact some users
> >>>>> "." Separator for nested configs are common in other systems (yq,
> >>> elastic search, etc.)
> >>>>> "Structured / nested config is easier for human eyes to read"...
> "Flat
> >>> config is harder for human eyes but easy for simple scripts"
> >>>>> For learning what configs are enabled, cassandra.yaml isn't the best
> >>> interface as it may not reflect the actual configs; we can better
> expose
> >>> this in CQL and/or Sidecar
> >>>>> What should our default example cassandra.yaml file use (flat or
> >>> nested)?  Currently default shows nested
> >>>>> When projecting the Config into CQL, we may want to consider UDTs to
> >>> represent the complex types
> >>>>> Current limitations in CQL make nested structures hard to work with,
> >> it
> >>> may be worth wild to expand CQL support for nested structures.
> >>>>>
> >>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> >>> be reusable outside of yaml parsing, 2) support setters (we currently
> do,
> >>> but setters must be snake case… I fixed that)…, 3) support both nested
> >> and
> >>> structured, 4) support ignoring fields in a consistent way (Settings
> >> vtable
> >>> will include things SnakeYAML won’t and visa-versa).
> >>>>>
> >>>>> https://github.com/apache/cassandra/pull/1335 <
> >>> https://github.com/apache/cassandra/pull/1335><
> >> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a
> final
> >>> ready to merge thing, but instead a POC to show how we can solve a lot
> of
> >>> the core problems in a consistent and reusable manner.
> >>>>>
> >>>>> The following cassandra.yaml was used to show both worlds would work
> >>> fine in the config (and compliment each other)
> >>>>>
> >>>>> track_warnings:
> >>>>> enabled: true
> >>>>> # nested relative to the local level (TrackWarnings)
> >>>>> coordinator_read_size.warn_threshold_kb: 1024
> >>>>> local_read_size.abort_threshold_kb: 1024
> >>>>> row_index_size:
> >>>>>  warn_threshold_kb: 1024
> >>>>>  abort_threshold_kb: 1024
> >>>>> # nested relative to the top level
> >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> >>>>>
> >>>>> For the “Settings” vtable, a new Loader interface was added to get
> all
> >>> the properties, and Properties.flatten would turn every property into a
> >>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> >>> isCollection).  This doesn’t solve 100% of the issues that vtable has
> >>> (types such as Duration would need additional translation as they are
> >>> Scalar but need a translation from String -> Duration), and doesn’t
> solve
> >>> the fact the table currently uses “_”.
> >>>>>
> >>>>>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
> >>>>>>
> >>>>>> I meant to imply we should improve our UDT usability to support this
> >>> kind of querying, essentially – but that if we support a simple
> >>> text->property setup we might want to offer LIKE support so we can
> search
> >>> them (via simple filtering, not any index) – which is actually pretty
> >> easy
> >>> to provide.
> >>>>>>
> >>>>>> I think we should aim to provide users all the facilities they need
> >> to
> >>> interact with config via vtables. If the user requires external
> tooling,
> >> it
> >>> suggests a weakness in CQL that we should address, and maybe help the
> >> user
> >>> in other scenario too…
> >>>>>>
> >>>>>> From: Joseph Lynch <joe.e.ly...@gmail.com>
> >>>>>> Date: Monday, 29 November 2021 at 17:32
> >>>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>>>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
> >>>>>> <bened...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Maybe we can make our query language more expressive 😊
> >>>>>>>
> >>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to
> >>> find/discover flattened config parameters?
> >>>>>>
> >>>>>> This sounds more complicated than just having the settings virtual
> >>>>>> table return text (dot encoded) -> text (json) and probably not even
> >>>>>> that much more useful. A full table scan on the settings table could
> >>>>>> return all top level keys (strings before the first dot) and if we
> >>>>>> just return a valid json string then users can bring their own
> >>>>>> querying capabilities via jq [1], or one line of code in almost any
> >>>>>> programming language (especially python, perl, etc ...).
> >>>>>>
> >>>>>> Alternatively if we want to modify the grammar it seems supporting
> >>>>>> structured data querying on text fields would maybe be more
> >> preferable
> >>>>>> to LIKE since you could get what you want without a grammar change
> >> and
> >>>>>> if we could generalize to any text column it would be amazingly
> >> useful
> >>>>>> elsewhere to users. For example, we could emulate jq's query syntax
> >> in
> >>>>>> the select which is, imo, best-in-class for quickly querying into
> >>>>>> nearest structures. Assuming a key (text) -> value (json) schema:
> >>>>>>
> >>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> >>>>>>
> >>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> >>>>>>
> >>>>>> To have exactly jq syntax (but harder to parse) it would be:
> >>>>>>
> >>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> >>>>>>
> >>>>>> Since we're not indexing the structured data in any way, filtering
> >>>>>> before selection probably doesn't give us much performance
> >> improvement
> >>>>>> as we'd still have to parse the whole text field in most cases.
> >>>>>>
> >>>>>> -Joey
> >>>>>>
> >>>>>> [1] https://stedolan.github.io/jq/
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to