Thank you for confirming as I misread your email at first :-) I had a chat with David last week and I don’t think his plan is reworking of 15234 but incremental improvements on top of it. Regarding config, after spending time cleaning around and looking more into detail my only appeal is: - Centralized management and not 5 places to change things when you add new config so we are less error-prone - Documenting things for people who add new config or for our users (I promised and I will do it for 15234 but it will be good to continue doing it with any further changes down the road) - be careful with breaking changes
Thank you Ekaterina On Tue, 30 Nov 2021 at 8:59, bened...@apache.org <bened...@apache.org> wrote: > I mean that it has been waiting for months, is ready to go, and I don’t > want to hold you up any longer. > > From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > Date: Tuesday, 30 November 2021 at 13:44 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] Nested YAML configs for new features > “ > IMO 15234 has sailed – it’s been held up for a long time, and was brought > to this list for discussion with no engagement. Ekaterina is long overdue > being able to commit her work. “ > > > Sailed? I submitted the patch a week ago for review. Not sure how to > understand this statement. Can elaborate, please? > > On Tue, 30 Nov 2021 at 8:09, bened...@apache.org <bened...@apache.org> > wrote: > > > The problem with scoping this to “features” is that we end up with at > best > > local coherence. The config file as a whole will end up just as > incoherent > > through its design evolution as it has historically. > > > > If you take a look at my proposed layout for the overall config, there is > > a “limits” section that specifies thresholds for reporting warnings and > > errors for various scenario. In this case, we probably don’t also want > > per-feature limits? We’re also mixing terminology already, with > > limits/thresholds and fail/abort. > > > > It’s a lot of work to come up with a coherent and intuitive config > layout. > > We probably want to at least create some documentation in-tree > stipulating > > terminology with respect to plurals, verbs/nouns, and specific terms > > (period, abort, limit, datacenter vs dc, etc), but ideally we would have > a > > common end goal for the config file. > > > > > leave non-features to CASSANDRA-15234 > > > > IMO 15234 has sailed – it’s been held up for a long time, and was brought > > to this list for discussion with no engagement. Ekaterina is long overdue > > being able to commit her work. > > > > > > From: David Capwell <dcapw...@apple.com.INVALID> > > Date: Monday, 29 November 2021 at 23:44 > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > Subject: Re: [DISCUSS] Nested YAML configs for new features > > > but I would hate to repeat the mistakes of our past by evolving the > > config in a new direction without any coherent overarching design. > > > > At the start I asked to keep the thread local to new features, but to > more > > flesh out an “overarching design” maybe we should increase the “desired” > > scope to be “feature” (and leave non-features to CASSANDRA-15234 - > > Standardise config and JVM parameters)? Aka, do we think the following > is > > more ideal (configs scoped to a feature) > > > > hinted_handoff: > > enabled: true > > disabled_datacenters: > > - DC1 > > - DC2 > > max_window: 3h > > flush_period: 10s > > max_file_size: 128mb > > compression: > > class_name: LZ4Compressor > > parameters: > > a: b > > > > track_warnings: > > enabled: true > > local_read_size: > > warn_threshold: 1mb > > abort_threshold: 10mb > > coordinator_read_size: > > warn_threshold: 5mb > > abort_threshold: 20mb > > > > > > OR > > > > # I had to rename hint configs as there was 0 consistent naming > > hinted_handoff_enabled: true > > hinted_handoff_disabled_datacenters: > > - 'DC1' > > - 'DC2' > > hinted_handoff_max_window: 3h > > hinted_handoff_max_file_size: 128mb > > hinted_handoff_flush_period: 10s > > hinted_handoff_compression: > > class_name: LZ4Compressor > > parameters: > > a: b > > > > track_warnings_enabled: true > > track_warnings_local_read_size_warn_threshold: 1mb > > track_warnings_local_read_size_abort_threshold: 10mb > > track_warnings_coordinator_read_size_warn_threshold: 5mb > > track_warnings_coordinator_read_size_abort_threshold: 20mb > > > > > > The main issue I have with flat structure is that we have no way to > > enforce standard naming; if you look at the hint example there were at > > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can > we > > actually maintain that?). And one of the core reasons track_warnings > went > > nested was that warn/abort some times became warn/fail and threshold some > > times was thresholds…. By embracing nested structure we can actually > > enforce consistency, with flat we have no way to maintain consistency. > > > > Additionally by embracing the nested structure we can accept a flat one > as > > well (PR in CASSANDRA-17166 shows this working) if users desire it; so we > > get the consistency of nested, and the “grep” benefits of flat. > > > > > > > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote: > > > > > > If we’re thinking of moving towards nested configuration, then before > > employing the approach further we would ideally consider what a fully > > nested config looks like for the project. Ekaterina has done a lot to > clean > > up inconsistent naming, but I would hate to repeat the mistakes of our > past > > by evolving the config in a new direction without any coherent > overarching > > design. > > > > > > In case anyone missed it in the earlier discussion, this was my attempt > > to prototype a nested config: > > > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > > > > > > I don’t have any specific attachment to it, but settling on some > > approximate scheme would be helpful IMO. > > > > > > From: David Capwell <dcapw...@apple.com.INVALID> > > > Date: Monday, 29 November 2021 at 20:38 > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > Subject: Re: [DISCUSS] Nested YAML configs for new features > > >> What should our default example cassandra.yaml file use (flat or > > nested)? Currently default shows nested > > > > > > Was told this statement was confusing, so trying to clarify. At the > > moment we do not allow a nested config to be expressed in any way outside > > of nesting it (excluding YAML’s ability to inline objects), so if we did > > allow flat config representation of nested configs, then this would be a > > brand new feature; we currently show the nested structure in > cassandra.yaml > > > > > >> On Nov 29, 2021, at 11:58 AM, David Capwell > <dcapw...@apple.com.INVALID> > > wrote: > > >> > > >> Thanks everyone for the comments, I hope below is a good summary of > all > > the talking points? > > >> > > >> We already use nested configs (networking, seed provider, commit > > log/hint compression, back pressure, etc.) > > >> Flat configs are easier for grep, but can be solved with grep -A/-B > > and/or yq > > >> It would be possible to support flat versions of our configs in > > cassandra.yaml (in addition to the nested versions) > > >> "Settings" vtable currently uses the "_" separator (example of > > encryption/audit log). Switching to "." Would be a change in behavior > > which may impact some users > > >> "." Separator for nested configs are common in other systems (yq, > > elastic search, etc.) > > >> "Structured / nested config is easier for human eyes to read"... "Flat > > config is harder for human eyes but easy for simple scripts" > > >> For learning what configs are enabled, cassandra.yaml isn't the best > > interface as it may not reflect the actual configs; we can better expose > > this in CQL and/or Sidecar > > >> What should our default example cassandra.yaml file use (flat or > > nested)? Currently default shows nested > > >> When projecting the Config into CQL, we may want to consider UDTs to > > represent the complex types > > >> Current limitations in CQL make nested structures hard to work with, > it > > may be worth wild to expand CQL support for nested structures. > > >> > > >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) > > be reusable outside of yaml parsing, 2) support setters (we currently do, > > but setters must be snake case… I fixed that)…, 3) support both nested > and > > structured, 4) support ignoring fields in a consistent way (Settings > vtable > > will include things SnakeYAML won’t and visa-versa). > > >> > > >> https://github.com/apache/cassandra/pull/1335 < > > https://github.com/apache/cassandra/pull/1335>< > https://github.com/apache/cassandra/pull/1335%3e>. This PR is NOT a final > > ready to merge thing, but instead a POC to show how we can solve a lot of > > the core problems in a consistent and reusable manner. > > >> > > >> The following cassandra.yaml was used to show both worlds would work > > fine in the config (and compliment each other) > > >> > > >> track_warnings: > > >> enabled: true > > >> # nested relative to the local level (TrackWarnings) > > >> coordinator_read_size.warn_threshold_kb: 1024 > > >> local_read_size.abort_threshold_kb: 1024 > > >> row_index_size: > > >> warn_threshold_kb: 1024 > > >> abort_threshold_kb: 1024 > > >> # nested relative to the top level > > >> track_warnings.coordinator_read_size.abort_threshold_kb: 42 > > >> > > >> For the “Settings” vtable, a new Loader interface was added to get all > > the properties, and Properties.flatten would turn every property into a > > “flatten” version (isScalar (isPrimitive or not hasSubProperties) or > > isCollection). This doesn’t solve 100% of the issues that vtable has > > (types such as Duration would need additional translation as they are > > Scalar but need a translation from String -> Duration), and doesn’t solve > > the fact the table currently uses “_”. > > >> > > >>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote: > > >>> > > >>> I meant to imply we should improve our UDT usability to support this > > kind of querying, essentially – but that if we support a simple > > text->property setup we might want to offer LIKE support so we can search > > them (via simple filtering, not any index) – which is actually pretty > easy > > to provide. > > >>> > > >>> I think we should aim to provide users all the facilities they need > to > > interact with config via vtables. If the user requires external tooling, > it > > suggests a weakness in CQL that we should address, and maybe help the > user > > in other scenario too… > > >>> > > >>> From: Joseph Lynch <joe.e.ly...@gmail.com> > > >>> Date: Monday, 29 November 2021 at 17:32 > > >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >>> Subject: Re: [DISCUSS] Nested YAML configs for new features > > >>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org > > >>> <bened...@apache.org> wrote: > > >>>> > > >>>> Maybe we can make our query language more expressive 😊 > > >>>> > > >>>> We might anyway want to introduce e.g. a LIKE filtering option to > > find/discover flattened config parameters? > > >>> > > >>> This sounds more complicated than just having the settings virtual > > >>> table return text (dot encoded) -> text (json) and probably not even > > >>> that much more useful. A full table scan on the settings table could > > >>> return all top level keys (strings before the first dot) and if we > > >>> just return a valid json string then users can bring their own > > >>> querying capabilities via jq [1], or one line of code in almost any > > >>> programming language (especially python, perl, etc ...). > > >>> > > >>> Alternatively if we want to modify the grammar it seems supporting > > >>> structured data querying on text fields would maybe be more > preferable > > >>> to LIKE since you could get what you want without a grammar change > and > > >>> if we could generalize to any text column it would be amazingly > useful > > >>> elsewhere to users. For example, we could emulate jq's query syntax > in > > >>> the select which is, imo, best-in-class for quickly querying into > > >>> nearest structures. Assuming a key (text) -> value (json) schema: > > >>> > > >>> 'a' -> "{'b': [{'c': {'d': 4}}]}", > > >>> > > >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a'; > > >>> > > >>> To have exactly jq syntax (but harder to parse) it would be: > > >>> > > >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a'; > > >>> > > >>> Since we're not indexing the structured data in any way, filtering > > >>> before selection probably doesn't give us much performance > improvement > > >>> as we'd still have to parse the whole text field in most cases. > > >>> > > >>> -Joey > > >>> > > >>> [1] https://stedolan.github.io/jq/ > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > >