Thanks everyone for the feedback!  If I am reading this properly I am seeing 
the following

* Good with nested configs
* Good with YAML layer supporting flat structure (possible foo.bar.baz for the 
path foo: {bar: {baz: 42}}), how this relates with Settings table should be 
resolved, but there is a open ticket for this (enhance our YAML 
CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
* Where/How we group is an open question, maybe we move this to a JIRA as 
follow up work to CASSANDRA-15234?

> We’re also mixing terminology already, with limits/thresholds and fail/abort.

Spoke with Ekaterina about this, and not solved in 15234; lets move this to a 
follow up JIRA for 15234?

> On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e.dimitr...@gmail.com> 
> wrote:
> 
> Thank you for confirming as I misread your email at first :-)
> I had a chat with David last week and I don’t think his plan is reworking
> of 15234 but incremental improvements on top of it.
> Regarding config, after spending time cleaning around and looking more into
> detail my only appeal is:
> - Centralized management and not 5 places to change things when you add new
> config so we are less error-prone
> - Documenting things for people who add new config or for our users (I
> promised and I will do it for 15234 but it will be good to continue doing
> it with any further changes down the road)
> - be careful with breaking changes
> 
> Thank you
> Ekaterina
> 
> On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org>
> wrote:
> 
>> I mean that it has been waiting for months, is ready to go, and I don’t
>> want to hold you up any longer.
>> 
>> From: Ekaterina Dimitrova <e.dimitr...@gmail.com>
>> Date: Tuesday, 30 November 2021 at 13:44
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> “
>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>> to this list for discussion with no engagement. Ekaterina is long overdue
>> being able to commit her work. “
>> 
>> 
>> Sailed? I submitted the patch a week ago for review. Not sure how to
>> understand this statement. Can elaborate, please?
>> 
>> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org>
>> wrote:
>> 
>>> The problem with scoping this to “features” is that we end up with at
>> best
>>> local coherence. The config file as a whole will end up just as
>> incoherent
>>> through its design evolution as it has historically.
>>> 
>>> If you take a look at my proposed layout for the overall config, there is
>>> a “limits” section that specifies thresholds for reporting warnings and
>>> errors for various scenario. In this case, we probably don’t also want
>>> per-feature limits? We’re also mixing terminology already, with
>>> limits/thresholds and fail/abort.
>>> 
>>> It’s a lot of work to come up with a coherent and intuitive config
>> layout.
>>> We probably want to at least create some documentation in-tree
>> stipulating
>>> terminology with respect to plurals, verbs/nouns, and specific terms
>>> (period, abort, limit, datacenter vs dc, etc), but ideally we would have
>> a
>>> common end goal for the config file.
>>> 
>>>> leave non-features to CASSANDRA-15234
>>> 
>>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>>> to this list for discussion with no engagement. Ekaterina is long overdue
>>> being able to commit her work.
>>> 
>>> 
>>> From: David Capwell <dcapw...@apple.com.INVALID>
>>> Date: Monday, 29 November 2021 at 23:44
>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>> but I would hate to repeat the mistakes of our past by evolving the
>>> config in a new direction without any coherent overarching design.
>>> 
>>> At the start I asked to keep the thread local to new features, but to
>> more
>>> flesh out an “overarching design” maybe we should increase the “desired”
>>> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
>>> Standardise config and JVM parameters)?  Aka, do we think the following
>> is
>>> more ideal (configs scoped to a feature)
>>> 
>>> hinted_handoff:
>>>  enabled: true
>>>  disabled_datacenters:
>>>    - DC1
>>>    - DC2
>>>  max_window: 3h
>>>  flush_period: 10s
>>>  max_file_size: 128mb
>>>  compression:
>>>    class_name: LZ4Compressor
>>>    parameters:
>>>      a: b
>>> 
>>> track_warnings:
>>>  enabled: true
>>>  local_read_size:
>>>    warn_threshold: 1mb
>>>    abort_threshold: 10mb
>>>  coordinator_read_size:
>>>    warn_threshold: 5mb
>>>    abort_threshold: 20mb
>>> 
>>> 
>>> OR
>>> 
>>> # I had to rename hint configs as there was 0 consistent naming
>>> hinted_handoff_enabled: true
>>> hinted_handoff_disabled_datacenters:
>>>  - 'DC1'
>>>  - 'DC2'
>>> hinted_handoff_max_window: 3h
>>> hinted_handoff_max_file_size: 128mb
>>> hinted_handoff_flush_period: 10s
>>> hinted_handoff_compression:
>>>  class_name: LZ4Compressor
>>>  parameters:
>>>    a: b
>>> 
>>> track_warnings_enabled: true
>>> track_warnings_local_read_size_warn_threshold: 1mb
>>> track_warnings_local_read_size_abort_threshold: 10mb
>>> track_warnings_coordinator_read_size_warn_threshold: 5mb
>>> track_warnings_coordinator_read_size_abort_threshold: 20mb
>>> 
>>> 
>>> The main issue I have with flat structure is that we have no way to
>>> enforce standard naming; if you look at the hint example there were at
>>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can
>> we
>>> actually maintain that?).  And one of the core reasons track_warnings
>> went
>>> nested was that warn/abort some times became warn/fail and threshold some
>>> times was thresholds…. By embracing nested structure we can actually
>>> enforce consistency, with flat we have no way to maintain consistency.
>>> 
>>> Additionally by embracing the nested structure we can accept a flat one
>> as
>>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
>>> get the consistency of nested, and the “grep” benefits of flat.
>>> 
>>> 
>>>> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
>>>> 
>>>> If we’re thinking of moving towards nested configuration, then before
>>> employing the approach further we would ideally consider what a fully
>>> nested config looks like for the project. Ekaterina has done a lot to
>> clean
>>> up inconsistent naming, but I would hate to repeat the mistakes of our
>> past
>>> by evolving the config in a new direction without any coherent
>> overarching
>>> design.
>>>> 
>>>> In case anyone missed it in the earlier discussion, this was my attempt
>>> to prototype a nested config:
>>> 
>> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
>>>> 
>>>> I don’t have any specific attachment to it, but settling on some
>>> approximate scheme would be helpful IMO.
>>>> 
>>>> From: David Capwell <dcapw...@apple.com.INVALID>
>>>> Date: Monday, 29 November 2021 at 20:38
>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>>> What should our default example cassandra.yaml file use (flat or
>>> nested)?  Currently default shows nested
>>>> 
>>>> Was told this statement was confusing, so trying to clarify.  At the
>>> moment we do not allow a nested config to be expressed in any way outside
>>> of nesting it (excluding YAML’s ability to inline objects), so if we did
>>> allow flat config representation of nested configs, then this would be a
>>> brand new feature; we currently show the nested structure in
>> cassandra.yaml
>>>> 
>>>>> On Nov 29, 2021, at 11:58 AM, David Capwell
>> <dcapw...@apple.com.INVALID>
>>> wrote:
>>>>> 
>>>>> Thanks everyone for the comments, I hope below is a good summary of
>> all
>>> the talking points?
>>>>> 
>>>>> We already use nested configs (networking, seed provider, commit
>>> log/hint compression, back pressure, etc.)
>>>>> Flat configs are easier for grep, but can be solved with grep -A/-B
>>> and/or yq
>>>>> It would be possible to support flat versions of our configs in
>>> cassandra.yaml (in addition to the nested versions)
>>>>> "Settings" vtable currently uses the "_" separator (example of
>>> encryption/audit log).  Switching to "." Would be a change in behavior
>>> which may impact some users
>>>>> "." Separator for nested configs are common in other systems (yq,
>>> elastic search, etc.)
>>>>> "Structured / nested config is easier for human eyes to read"... "Flat
>>> config is harder for human eyes but easy for simple scripts"
>>>>> For learning what configs are enabled, cassandra.yaml isn't the best
>>> interface as it may not reflect the actual configs; we can better expose
>>> this in CQL and/or Sidecar
>>>>> What should our default example cassandra.yaml file use (flat or
>>> nested)?  Currently default shows nested
>>>>> When projecting the Config into CQL, we may want to consider UDTs to
>>> represent the complex types
>>>>> Current limitations in CQL make nested structures hard to work with,
>> it
>>> may be worth wild to expand CQL support for nested structures.
>>>>> 
>>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
>>> be reusable outside of yaml parsing, 2) support setters (we currently do,
>>> but setters must be snake case… I fixed that)…, 3) support both nested
>> and
>>> structured, 4) support ignoring fields in a consistent way (Settings
>> vtable
>>> will include things SnakeYAML won’t and visa-versa).
>>>>> 
>>>>> https://github.com/apache/cassandra/pull/1335 <
>>> https://github.com/apache/cassandra/pull/1335><
>> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a final
>>> ready to merge thing, but instead a POC to show how we can solve a lot of
>>> the core problems in a consistent and reusable manner.
>>>>> 
>>>>> The following cassandra.yaml was used to show both worlds would work
>>> fine in the config (and compliment each other)
>>>>> 
>>>>> track_warnings:
>>>>> enabled: true
>>>>> # nested relative to the local level (TrackWarnings)
>>>>> coordinator_read_size.warn_threshold_kb: 1024
>>>>> local_read_size.abort_threshold_kb: 1024
>>>>> row_index_size:
>>>>>  warn_threshold_kb: 1024
>>>>>  abort_threshold_kb: 1024
>>>>> # nested relative to the top level
>>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>>>>> 
>>>>> For the “Settings” vtable, a new Loader interface was added to get all
>>> the properties, and Properties.flatten would turn every property into a
>>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
>>> isCollection).  This doesn’t solve 100% of the issues that vtable has
>>> (types such as Duration would need additional translation as they are
>>> Scalar but need a translation from String -> Duration), and doesn’t solve
>>> the fact the table currently uses “_”.
>>>>> 
>>>>>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
>>>>>> 
>>>>>> I meant to imply we should improve our UDT usability to support this
>>> kind of querying, essentially – but that if we support a simple
>>> text->property setup we might want to offer LIKE support so we can search
>>> them (via simple filtering, not any index) – which is actually pretty
>> easy
>>> to provide.
>>>>>> 
>>>>>> I think we should aim to provide users all the facilities they need
>> to
>>> interact with config via vtables. If the user requires external tooling,
>> it
>>> suggests a weakness in CQL that we should address, and maybe help the
>> user
>>> in other scenario too…
>>>>>> 
>>>>>> From: Joseph Lynch <joe.e.ly...@gmail.com>
>>>>>> Date: Monday, 29 November 2021 at 17:32
>>>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>>>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
>>>>>> <bened...@apache.org> wrote:
>>>>>>> 
>>>>>>> Maybe we can make our query language more expressive 😊
>>>>>>> 
>>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to
>>> find/discover flattened config parameters?
>>>>>> 
>>>>>> This sounds more complicated than just having the settings virtual
>>>>>> table return text (dot encoded) -> text (json) and probably not even
>>>>>> that much more useful. A full table scan on the settings table could
>>>>>> return all top level keys (strings before the first dot) and if we
>>>>>> just return a valid json string then users can bring their own
>>>>>> querying capabilities via jq [1], or one line of code in almost any
>>>>>> programming language (especially python, perl, etc ...).
>>>>>> 
>>>>>> Alternatively if we want to modify the grammar it seems supporting
>>>>>> structured data querying on text fields would maybe be more
>> preferable
>>>>>> to LIKE since you could get what you want without a grammar change
>> and
>>>>>> if we could generalize to any text column it would be amazingly
>> useful
>>>>>> elsewhere to users. For example, we could emulate jq's query syntax
>> in
>>>>>> the select which is, imo, best-in-class for quickly querying into
>>>>>> nearest structures. Assuming a key (text) -> value (json) schema:
>>>>>> 
>>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>>>>> 
>>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>>>>> 
>>>>>> To have exactly jq syntax (but harder to parse) it would be:
>>>>>> 
>>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>>>>> 
>>>>>> Since we're not indexing the structured data in any way, filtering
>>>>>> before selection probably doesn't give us much performance
>> improvement
>>>>>> as we'd still have to parse the whole text field in most cases.
>>>>>> 
>>>>>> -Joey
>>>>>> 
>>>>>> [1] https://stedolan.github.io/jq/
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to