Please find my comments inline below * Good with nested configs - the question is how they will be introduced and maintained? I wouldn't advocate for maintaining more than yaml file but probably as you once mentioned some time ago (if I remember correctly) - having one format as default and just documenting the support for the other one/ones. Now which is the default one is a different topic. * Where/How we group is an open question, maybe we move this to a JIRA as follow up work to CASSANDRA-15234? - not part of CASSANDRA-15234 as per all the discussions, already in review (thank you for your first quick round btw, appreciate it!)
Spoke with Ekaterina about this, and not solved in 15234; let's move this to a follow up JIRA for 15234? - For the broader audience, currently what I solve around naming in CASSANDRA-15234 is removing the unit suffix and moving to the format noun_verb the config parameters names. After all discussions and realizing the great interest and variety of opinions, I tried really to split more tickets from CASSANDRA-15234 and to keep primarily the new custom types and the new framework with backward compatibility as the main body of work, good also for the reviewers. Last year I came up with the idea of reorganizing the config file a bit which led to discussions. So considering my previous point about splitting to a more incremental approach considering the variety of opinions, I suggested when submitting for review to open a new ticket for that new organization of our config file. Probably we can add the abort/fail and any other similar concerns/questions there post CASSANDRA-15234? On Fri, 3 Dec 2021 at 13:34, David Capwell <dcapw...@apple.com.invalid> wrote: > Thanks everyone for the feedback! If I am reading this properly I am > seeing the following > > * Good with nested configs > * Good with YAML layer supporting flat structure (possible foo.bar.baz for > the path foo: {bar: {baz: 42}}), how this relates with Settings table > should be resolved, but there is a open ticket for this (enhance our YAML > CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254) > * Where/How we group is an open question, maybe we move this to a JIRA as > follow up work to CASSANDRA-15234? > > > We’re also mixing terminology already, with limits/thresholds and > fail/abort. > > Spoke with Ekaterina about this, and not solved in 15234; lets move this > to a follow up JIRA for 15234? > > > On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e.dimitr...@gmail.com> > wrote: > > > > Thank you for confirming as I misread your email at first :-) > > I had a chat with David last week and I don’t think his plan is reworking > > of 15234 but incremental improvements on top of it. > > Regarding config, after spending time cleaning around and looking more > into > > detail my only appeal is: > > - Centralized management and not 5 places to change things when you add > new > > config so we are less error-prone > > - Documenting things for people who add new config or for our users (I > > promised and I will do it for 15234 but it will be good to continue doing > > it with any further changes down the road) > > - be careful with breaking changes > > > > Thank you > > Ekaterina > > > > On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org> > > wrote: > > > >> I mean that it has been waiting for months, is ready to go, and I don’t > >> want to hold you up any longer. > >> > >> From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > >> Date: Tuesday, 30 November 2021 at 13:44 > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >> Subject: Re: [DISCUSS] Nested YAML configs for new features > >> “ > >> IMO 15234 has sailed – it’s been held up for a long time, and was > brought > >> to this list for discussion with no engagement. Ekaterina is long > overdue > >> being able to commit her work. “ > >> > >> > >> Sailed? I submitted the patch a week ago for review. Not sure how to > >> understand this statement. Can elaborate, please? > >> > >> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org> > >> wrote: > >> > >>> The problem with scoping this to “features” is that we end up with at > >> best > >>> local coherence. The config file as a whole will end up just as > >> incoherent > >>> through its design evolution as it has historically. > >>> > >>> If you take a look at my proposed layout for the overall config, there > is > >>> a “limits” section that specifies thresholds for reporting warnings and > >>> errors for various scenario. In this case, we probably don’t also want > >>> per-feature limits? We’re also mixing terminology already, with > >>> limits/thresholds and fail/abort. > >>> > >>> It’s a lot of work to come up with a coherent and intuitive config > >> layout. > >>> We probably want to at least create some documentation in-tree > >> stipulating > >>> terminology with respect to plurals, verbs/nouns, and specific terms > >>> (period, abort, limit, datacenter vs dc, etc), but ideally we would > have > >> a > >>> common end goal for the config file. > >>> > >>>> leave non-features to CASSANDRA-15234 > >>> > >>> IMO 15234 has sailed – it’s been held up for a long time, and was > brought > >>> to this list for discussion with no engagement. Ekaterina is long > overdue > >>> being able to commit her work. > >>> > >>> > >>> From: David Capwell <dcapw...@apple.com.INVALID> > >>> Date: Monday, 29 November 2021 at 23:44 > >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>> Subject: Re: [DISCUSS] Nested YAML configs for new features > >>>> but I would hate to repeat the mistakes of our past by evolving the > >>> config in a new direction without any coherent overarching design. > >>> > >>> At the start I asked to keep the thread local to new features, but to > >> more > >>> flesh out an “overarching design” maybe we should increase the > “desired” > >>> scope to be “feature” (and leave non-features to CASSANDRA-15234 - > >>> Standardise config and JVM parameters)? Aka, do we think the following > >> is > >>> more ideal (configs scoped to a feature) > >>> > >>> hinted_handoff: > >>> enabled: true > >>> disabled_datacenters: > >>> - DC1 > >>> - DC2 > >>> max_window: 3h > >>> flush_period: 10s > >>> max_file_size: 128mb > >>> compression: > >>> class_name: LZ4Compressor > >>> parameters: > >>> a: b > >>> > >>> track_warnings: > >>> enabled: true > >>> local_read_size: > >>> warn_threshold: 1mb > >>> abort_threshold: 10mb > >>> coordinator_read_size: > >>> warn_threshold: 5mb > >>> abort_threshold: 20mb > >>> > >>> > >>> OR > >>> > >>> # I had to rename hint configs as there was 0 consistent naming > >>> hinted_handoff_enabled: true > >>> hinted_handoff_disabled_datacenters: > >>> - 'DC1' > >>> - 'DC2' > >>> hinted_handoff_max_window: 3h > >>> hinted_handoff_max_file_size: 128mb > >>> hinted_handoff_flush_period: 10s > >>> hinted_handoff_compression: > >>> class_name: LZ4Compressor > >>> parameters: > >>> a: b > >>> > >>> track_warnings_enabled: true > >>> track_warnings_local_read_size_warn_threshold: 1mb > >>> track_warnings_local_read_size_abort_threshold: 10mb > >>> track_warnings_coordinator_read_size_warn_threshold: 5mb > >>> track_warnings_coordinator_read_size_abort_threshold: 20mb > >>> > >>> > >>> The main issue I have with flat structure is that we have no way to > >>> enforce standard naming; if you look at the hint example there were at > >>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but > can > >> we > >>> actually maintain that?). And one of the core reasons track_warnings > >> went > >>> nested was that warn/abort some times became warn/fail and threshold > some > >>> times was thresholds…. By embracing nested structure we can actually > >>> enforce consistency, with flat we have no way to maintain consistency. > >>> > >>> Additionally by embracing the nested structure we can accept a flat one > >> as > >>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so > we > >>> get the consistency of nested, and the “grep” benefits of flat. > >>> > >>> > >>>> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: > >>>> > >>>> If we’re thinking of moving towards nested configuration, then before > >>> employing the approach further we would ideally consider what a fully > >>> nested config looks like for the project. Ekaterina has done a lot to > >> clean > >>> up inconsistent naming, but I would hate to repeat the mistakes of our > >> past > >>> by evolving the config in a new direction without any coherent > >> overarching > >>> design. > >>>> > >>>> In case anyone missed it in the earlier discussion, this was my > attempt > >>> to prototype a nested config: > >>> > >> > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > >>>> > >>>> I don’t have any specific attachment to it, but settling on some > >>> approximate scheme would be helpful IMO. > >>>> > >>>> From: David Capwell <dcapw...@apple.com.INVALID> > >>>> Date: Monday, 29 November 2021 at 20:38 > >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>> Subject: Re: [DISCUSS] Nested YAML configs for new features > >>>>> What should our default example cassandra.yaml file use (flat or > >>> nested)? Currently default shows nested > >>>> > >>>> Was told this statement was confusing, so trying to clarify. At the > >>> moment we do not allow a nested config to be expressed in any way > outside > >>> of nesting it (excluding YAML’s ability to inline objects), so if we > did > >>> allow flat config representation of nested configs, then this would be > a > >>> brand new feature; we currently show the nested structure in > >> cassandra.yaml > >>>> > >>>>> On Nov 29, 2021, at 11:58 AM, David Capwell > >> <dcapw...@apple.com.INVALID> > >>> wrote: > >>>>> > >>>>> Thanks everyone for the comments, I hope below is a good summary of > >> all > >>> the talking points? > >>>>> > >>>>> We already use nested configs (networking, seed provider, commit > >>> log/hint compression, back pressure, etc.) > >>>>> Flat configs are easier for grep, but can be solved with grep -A/-B > >>> and/or yq > >>>>> It would be possible to support flat versions of our configs in > >>> cassandra.yaml (in addition to the nested versions) > >>>>> "Settings" vtable currently uses the "_" separator (example of > >>> encryption/audit log). Switching to "." Would be a change in behavior > >>> which may impact some users > >>>>> "." Separator for nested configs are common in other systems (yq, > >>> elastic search, etc.) > >>>>> "Structured / nested config is easier for human eyes to read"... > "Flat > >>> config is harder for human eyes but easy for simple scripts" > >>>>> For learning what configs are enabled, cassandra.yaml isn't the best > >>> interface as it may not reflect the actual configs; we can better > expose > >>> this in CQL and/or Sidecar > >>>>> What should our default example cassandra.yaml file use (flat or > >>> nested)? Currently default shows nested > >>>>> When projecting the Config into CQL, we may want to consider UDTs to > >>> represent the complex types > >>>>> Current limitations in CQL make nested structures hard to work with, > >> it > >>> may be worth wild to expand CQL support for nested structures. > >>>>> > >>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) > >>> be reusable outside of yaml parsing, 2) support setters (we currently > do, > >>> but setters must be snake case… I fixed that)…, 3) support both nested > >> and > >>> structured, 4) support ignoring fields in a consistent way (Settings > >> vtable > >>> will include things SnakeYAML won’t and visa-versa). > >>>>> > >>>>> https://github.com/apache/cassandra/pull/1335 < > >>> https://github.com/apache/cassandra/pull/1335>< > >> https://github.com/apache/cassandra/pull/1335%3e>. This PR is NOT a > final > >>> ready to merge thing, but instead a POC to show how we can solve a lot > of > >>> the core problems in a consistent and reusable manner. > >>>>> > >>>>> The following cassandra.yaml was used to show both worlds would work > >>> fine in the config (and compliment each other) > >>>>> > >>>>> track_warnings: > >>>>> enabled: true > >>>>> # nested relative to the local level (TrackWarnings) > >>>>> coordinator_read_size.warn_threshold_kb: 1024 > >>>>> local_read_size.abort_threshold_kb: 1024 > >>>>> row_index_size: > >>>>> warn_threshold_kb: 1024 > >>>>> abort_threshold_kb: 1024 > >>>>> # nested relative to the top level > >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42 > >>>>> > >>>>> For the “Settings” vtable, a new Loader interface was added to get > all > >>> the properties, and Properties.flatten would turn every property into a > >>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or > >>> isCollection). This doesn’t solve 100% of the issues that vtable has > >>> (types such as Duration would need additional translation as they are > >>> Scalar but need a translation from String -> Duration), and doesn’t > solve > >>> the fact the table currently uses “_”. > >>>>> > >>>>>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote: > >>>>>> > >>>>>> I meant to imply we should improve our UDT usability to support this > >>> kind of querying, essentially – but that if we support a simple > >>> text->property setup we might want to offer LIKE support so we can > search > >>> them (via simple filtering, not any index) – which is actually pretty > >> easy > >>> to provide. > >>>>>> > >>>>>> I think we should aim to provide users all the facilities they need > >> to > >>> interact with config via vtables. If the user requires external > tooling, > >> it > >>> suggests a weakness in CQL that we should address, and maybe help the > >> user > >>> in other scenario too… > >>>>>> > >>>>>> From: Joseph Lynch <joe.e.ly...@gmail.com> > >>>>>> Date: Monday, 29 November 2021 at 17:32 > >>>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features > >>>>>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org > >>>>>> <bened...@apache.org> wrote: > >>>>>>> > >>>>>>> Maybe we can make our query language more expressive 😊 > >>>>>>> > >>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to > >>> find/discover flattened config parameters? > >>>>>> > >>>>>> This sounds more complicated than just having the settings virtual > >>>>>> table return text (dot encoded) -> text (json) and probably not even > >>>>>> that much more useful. A full table scan on the settings table could > >>>>>> return all top level keys (strings before the first dot) and if we > >>>>>> just return a valid json string then users can bring their own > >>>>>> querying capabilities via jq [1], or one line of code in almost any > >>>>>> programming language (especially python, perl, etc ...). > >>>>>> > >>>>>> Alternatively if we want to modify the grammar it seems supporting > >>>>>> structured data querying on text fields would maybe be more > >> preferable > >>>>>> to LIKE since you could get what you want without a grammar change > >> and > >>>>>> if we could generalize to any text column it would be amazingly > >> useful > >>>>>> elsewhere to users. For example, we could emulate jq's query syntax > >> in > >>>>>> the select which is, imo, best-in-class for quickly querying into > >>>>>> nearest structures. Assuming a key (text) -> value (json) schema: > >>>>>> > >>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}", > >>>>>> > >>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a'; > >>>>>> > >>>>>> To have exactly jq syntax (but harder to parse) it would be: > >>>>>> > >>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a'; > >>>>>> > >>>>>> Since we're not indexing the structured data in any way, filtering > >>>>>> before selection probably doesn't give us much performance > >> improvement > >>>>>> as we'd still have to parse the whole text field in most cases. > >>>>>> > >>>>>> -Joey > >>>>>> > >>>>>> [1] https://stedolan.github.io/jq/ > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >