Thanks everyone for the feedback! If I am reading this properly I am seeing the following
* Good with nested configs * Good with YAML layer supporting flat structure (possible foo.bar.baz for the path foo: {bar: {baz: 42}}), how this relates with Settings table should be resolved, but there is a open ticket for this (enhance our YAML CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254) * Where/How we group is an open question, maybe we move this to a JIRA as follow up work to CASSANDRA-15234? > We’re also mixing terminology already, with limits/thresholds and fail/abort. Spoke with Ekaterina about this, and not solved in 15234; lets move this to a follow up JIRA for 15234? > On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e.dimitr...@gmail.com> > wrote: > > Thank you for confirming as I misread your email at first :-) > I had a chat with David last week and I don’t think his plan is reworking > of 15234 but incremental improvements on top of it. > Regarding config, after spending time cleaning around and looking more into > detail my only appeal is: > - Centralized management and not 5 places to change things when you add new > config so we are less error-prone > - Documenting things for people who add new config or for our users (I > promised and I will do it for 15234 but it will be good to continue doing > it with any further changes down the road) > - be careful with breaking changes > > Thank you > Ekaterina > > On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org> > wrote: > >> I mean that it has been waiting for months, is ready to go, and I don’t >> want to hold you up any longer. >> >> From: Ekaterina Dimitrova <e.dimitr...@gmail.com> >> Date: Tuesday, 30 November 2021 at 13:44 >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> Subject: Re: [DISCUSS] Nested YAML configs for new features >> “ >> IMO 15234 has sailed – it’s been held up for a long time, and was brought >> to this list for discussion with no engagement. Ekaterina is long overdue >> being able to commit her work. “ >> >> >> Sailed? I submitted the patch a week ago for review. Not sure how to >> understand this statement. Can elaborate, please? >> >> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org> >> wrote: >> >>> The problem with scoping this to “features” is that we end up with at >> best >>> local coherence. The config file as a whole will end up just as >> incoherent >>> through its design evolution as it has historically. >>> >>> If you take a look at my proposed layout for the overall config, there is >>> a “limits” section that specifies thresholds for reporting warnings and >>> errors for various scenario. In this case, we probably don’t also want >>> per-feature limits? We’re also mixing terminology already, with >>> limits/thresholds and fail/abort. >>> >>> It’s a lot of work to come up with a coherent and intuitive config >> layout. >>> We probably want to at least create some documentation in-tree >> stipulating >>> terminology with respect to plurals, verbs/nouns, and specific terms >>> (period, abort, limit, datacenter vs dc, etc), but ideally we would have >> a >>> common end goal for the config file. >>> >>>> leave non-features to CASSANDRA-15234 >>> >>> IMO 15234 has sailed – it’s been held up for a long time, and was brought >>> to this list for discussion with no engagement. Ekaterina is long overdue >>> being able to commit her work. >>> >>> >>> From: David Capwell <dcapw...@apple.com.INVALID> >>> Date: Monday, 29 November 2021 at 23:44 >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features >>>> but I would hate to repeat the mistakes of our past by evolving the >>> config in a new direction without any coherent overarching design. >>> >>> At the start I asked to keep the thread local to new features, but to >> more >>> flesh out an “overarching design” maybe we should increase the “desired” >>> scope to be “feature” (and leave non-features to CASSANDRA-15234 - >>> Standardise config and JVM parameters)? Aka, do we think the following >> is >>> more ideal (configs scoped to a feature) >>> >>> hinted_handoff: >>> enabled: true >>> disabled_datacenters: >>> - DC1 >>> - DC2 >>> max_window: 3h >>> flush_period: 10s >>> max_file_size: 128mb >>> compression: >>> class_name: LZ4Compressor >>> parameters: >>> a: b >>> >>> track_warnings: >>> enabled: true >>> local_read_size: >>> warn_threshold: 1mb >>> abort_threshold: 10mb >>> coordinator_read_size: >>> warn_threshold: 5mb >>> abort_threshold: 20mb >>> >>> >>> OR >>> >>> # I had to rename hint configs as there was 0 consistent naming >>> hinted_handoff_enabled: true >>> hinted_handoff_disabled_datacenters: >>> - 'DC1' >>> - 'DC2' >>> hinted_handoff_max_window: 3h >>> hinted_handoff_max_file_size: 128mb >>> hinted_handoff_flush_period: 10s >>> hinted_handoff_compression: >>> class_name: LZ4Compressor >>> parameters: >>> a: b >>> >>> track_warnings_enabled: true >>> track_warnings_local_read_size_warn_threshold: 1mb >>> track_warnings_local_read_size_abort_threshold: 10mb >>> track_warnings_coordinator_read_size_warn_threshold: 5mb >>> track_warnings_coordinator_read_size_abort_threshold: 20mb >>> >>> >>> The main issue I have with flat structure is that we have no way to >>> enforce standard naming; if you look at the hint example there were at >>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can >> we >>> actually maintain that?). And one of the core reasons track_warnings >> went >>> nested was that warn/abort some times became warn/fail and threshold some >>> times was thresholds…. By embracing nested structure we can actually >>> enforce consistency, with flat we have no way to maintain consistency. >>> >>> Additionally by embracing the nested structure we can accept a flat one >> as >>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we >>> get the consistency of nested, and the “grep” benefits of flat. >>> >>> >>>> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: >>>> >>>> If we’re thinking of moving towards nested configuration, then before >>> employing the approach further we would ideally consider what a fully >>> nested config looks like for the project. Ekaterina has done a lot to >> clean >>> up inconsistent naming, but I would hate to repeat the mistakes of our >> past >>> by evolving the config in a new direction without any coherent >> overarching >>> design. >>>> >>>> In case anyone missed it in the earlier discussion, this was my attempt >>> to prototype a nested config: >>> >> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml >>>> >>>> I don’t have any specific attachment to it, but settling on some >>> approximate scheme would be helpful IMO. >>>> >>>> From: David Capwell <dcapw...@apple.com.INVALID> >>>> Date: Monday, 29 November 2021 at 20:38 >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] Nested YAML configs for new features >>>>> What should our default example cassandra.yaml file use (flat or >>> nested)? Currently default shows nested >>>> >>>> Was told this statement was confusing, so trying to clarify. At the >>> moment we do not allow a nested config to be expressed in any way outside >>> of nesting it (excluding YAML’s ability to inline objects), so if we did >>> allow flat config representation of nested configs, then this would be a >>> brand new feature; we currently show the nested structure in >> cassandra.yaml >>>> >>>>> On Nov 29, 2021, at 11:58 AM, David Capwell >> <dcapw...@apple.com.INVALID> >>> wrote: >>>>> >>>>> Thanks everyone for the comments, I hope below is a good summary of >> all >>> the talking points? >>>>> >>>>> We already use nested configs (networking, seed provider, commit >>> log/hint compression, back pressure, etc.) >>>>> Flat configs are easier for grep, but can be solved with grep -A/-B >>> and/or yq >>>>> It would be possible to support flat versions of our configs in >>> cassandra.yaml (in addition to the nested versions) >>>>> "Settings" vtable currently uses the "_" separator (example of >>> encryption/audit log). Switching to "." Would be a change in behavior >>> which may impact some users >>>>> "." Separator for nested configs are common in other systems (yq, >>> elastic search, etc.) >>>>> "Structured / nested config is easier for human eyes to read"... "Flat >>> config is harder for human eyes but easy for simple scripts" >>>>> For learning what configs are enabled, cassandra.yaml isn't the best >>> interface as it may not reflect the actual configs; we can better expose >>> this in CQL and/or Sidecar >>>>> What should our default example cassandra.yaml file use (flat or >>> nested)? Currently default shows nested >>>>> When projecting the Config into CQL, we may want to consider UDTs to >>> represent the complex types >>>>> Current limitations in CQL make nested structures hard to work with, >> it >>> may be worth wild to expand CQL support for nested structures. >>>>> >>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) >>> be reusable outside of yaml parsing, 2) support setters (we currently do, >>> but setters must be snake case… I fixed that)…, 3) support both nested >> and >>> structured, 4) support ignoring fields in a consistent way (Settings >> vtable >>> will include things SnakeYAML won’t and visa-versa). >>>>> >>>>> https://github.com/apache/cassandra/pull/1335 < >>> https://github.com/apache/cassandra/pull/1335>< >> https://github.com/apache/cassandra/pull/1335%3e>. This PR is NOT a final >>> ready to merge thing, but instead a POC to show how we can solve a lot of >>> the core problems in a consistent and reusable manner. >>>>> >>>>> The following cassandra.yaml was used to show both worlds would work >>> fine in the config (and compliment each other) >>>>> >>>>> track_warnings: >>>>> enabled: true >>>>> # nested relative to the local level (TrackWarnings) >>>>> coordinator_read_size.warn_threshold_kb: 1024 >>>>> local_read_size.abort_threshold_kb: 1024 >>>>> row_index_size: >>>>> warn_threshold_kb: 1024 >>>>> abort_threshold_kb: 1024 >>>>> # nested relative to the top level >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42 >>>>> >>>>> For the “Settings” vtable, a new Loader interface was added to get all >>> the properties, and Properties.flatten would turn every property into a >>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or >>> isCollection). This doesn’t solve 100% of the issues that vtable has >>> (types such as Duration would need additional translation as they are >>> Scalar but need a translation from String -> Duration), and doesn’t solve >>> the fact the table currently uses “_”. >>>>> >>>>>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote: >>>>>> >>>>>> I meant to imply we should improve our UDT usability to support this >>> kind of querying, essentially – but that if we support a simple >>> text->property setup we might want to offer LIKE support so we can search >>> them (via simple filtering, not any index) – which is actually pretty >> easy >>> to provide. >>>>>> >>>>>> I think we should aim to provide users all the facilities they need >> to >>> interact with config via vtables. If the user requires external tooling, >> it >>> suggests a weakness in CQL that we should address, and maybe help the >> user >>> in other scenario too… >>>>>> >>>>>> From: Joseph Lynch <joe.e.ly...@gmail.com> >>>>>> Date: Monday, 29 November 2021 at 17:32 >>>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features >>>>>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org >>>>>> <bened...@apache.org> wrote: >>>>>>> >>>>>>> Maybe we can make our query language more expressive 😊 >>>>>>> >>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to >>> find/discover flattened config parameters? >>>>>> >>>>>> This sounds more complicated than just having the settings virtual >>>>>> table return text (dot encoded) -> text (json) and probably not even >>>>>> that much more useful. A full table scan on the settings table could >>>>>> return all top level keys (strings before the first dot) and if we >>>>>> just return a valid json string then users can bring their own >>>>>> querying capabilities via jq [1], or one line of code in almost any >>>>>> programming language (especially python, perl, etc ...). >>>>>> >>>>>> Alternatively if we want to modify the grammar it seems supporting >>>>>> structured data querying on text fields would maybe be more >> preferable >>>>>> to LIKE since you could get what you want without a grammar change >> and >>>>>> if we could generalize to any text column it would be amazingly >> useful >>>>>> elsewhere to users. For example, we could emulate jq's query syntax >> in >>>>>> the select which is, imo, best-in-class for quickly querying into >>>>>> nearest structures. Assuming a key (text) -> value (json) schema: >>>>>> >>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}", >>>>>> >>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a'; >>>>>> >>>>>> To have exactly jq syntax (but harder to parse) it would be: >>>>>> >>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a'; >>>>>> >>>>>> Since we're not indexing the structured data in any way, filtering >>>>>> before selection probably doesn't give us much performance >> improvement >>>>>> as we'd still have to parse the whole text field in most cases. >>>>>> >>>>>> -Joey >>>>>> >>>>>> [1] https://stedolan.github.io/jq/ >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org