"we take "unclaimed" items and move them to their own InternalConfig.java
or something"

This is interesting. If we are meant to be still able to put these
properties into cassandra.yaml (even they are "internal ones") and they
would be just in InternalConfig.java for some basic separation of internal
/ user-facing configuration, then we would need to have two yaml loaders:

1) the one as we have now which loads cassandra.yaml it into Config.java
2) the second one which would load cassandra.yaml into InternalConfig.java

For both cases, we could not fail when there are unrecognized properties in
cassandra.yaml while parsing it (1), because every loader, for Config.java
as well as InternalConfig.java, is parsing just some "subset" of yaml.

(1)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java#L443-L444

If we just had "public InternalConfig internal = new InternalConfig" as a
field in Config.java, then this would lead to properties being effectively
renamed in cassandra.yaml like

internal:
    some_currently_internal_property: false

instead of just

some_currently_internal_property: false

I do not think we want to have them renamed / under different configuration
sections in yaml. I get that they are "internal" etc but we just don't know
how / where it is used and deployed and just blindly renaming them is not a
good idea imho.

On Mon, Jan 27, 2025 at 8:46 PM Josh McKenzie <jmcken...@apache.org> wrote:

> This may be an off-base comparison, but this reminds me of struggles we've
> had getting to 0 failing unit tests before and the debates on fencing off a
> snapshot of the current "failure set" so you can have a set point where no
> further degradation is allowed in a primary data set.
>
> All of which is to say - maybe at the end of the spreadsheet, we take
> "unclaimed" items and move them to their own InternalConfig.java or
> something and add an ant target that a) disallows further addition to
> InternalConfig.java w/out throwing an error / needing whitelist update, and
> b) disallows further regression in the Config.java <-> cassandra.yaml
> relationship for non-annotated fields.
>
> That way we can at least halt the progression of the disease even if we're
> stymied on cleaning up some of the existing symptoms.
>
> On Mon, Jan 27, 2025, at 1:38 PM, Štefan Miklošovič wrote:
>
> Indeed, we need to balance that and thoughtfully choose what is going to
> be added and what not. However, we should not hide something which is meant
> to be tweaked by a user. The config is intimidating mostly because
> everything is just in one file. I merely remember discussions a few years
> ago which were about splitting cassandra.yaml into multiple files which
> would be focused just on one subsystem / would cover some logically
> isolated domain.
>
> Anyway, I think the main goal of this effort for now would be to at least
> map where we are at. Some of them are genuinely missing. E.g. guardrails,
> how is a user meant to know about that if it is not even documented ...
>
> On Mon, Jan 27, 2025 at 6:16 PM Chris lohfink <cnl...@gmail.com> wrote:
>
> Might be a bit of a balance between exposing what people actually are
> likely to need to modify vs having a super intimidating config file. It's
> already nearly 2000 lines. Personally I'd rather see some
> auto-documentation or something that's in the docs
> <https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html>
> than an effort to manually add another 1000 lines.
>
> Chris
>
> On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
> Maybe I missed some patterns but it looks like a pretty good estimation, I
> did like 10 random checks manually to verify :-)
> I will try to make an ant target with a similar logic (hopefully, during
> the weekend)
> I will create a ticket to track this activity (to share attachments there
> to not overload the thread with such outputs in future).
>
> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> Oh my god, 112? :DD I was thinking it would be less than 10.
>
> Anyway, I think we need to integrate this to some ant target. If you
> expanded on this, that would be great.
>
> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
> A very primitive implementation of the 1st idea below:
>
> String configUrl = 
> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
> Field[] allFields = Config.class.getFields();
> List<String> topLevelPropertyNames = new ArrayList<>();
> for(Field field : allFields)
> {
>     if (!Modifier.*isStatic*(field.getModifiers()))
>     {
>         topLevelPropertyNames.add(field.getName());
>     }
> }
>
> URL url = new URL(configUrl);
> List<String> lines = Files.*readAllLines*(Paths.*get*(url.toURI()));
>
> int missedCount = 0;
> for (String propertyName : topLevelPropertyNames)
> {
>     boolean found = false;
>     for (String line : lines)
>     {
>         if (line.startsWith(propertyName + ":")
>             || line.startsWith("#" + propertyName + ":")
>             || line.startsWith("# " + propertyName + ":")) {
>             found = true;
>             break;
>         }
>     }
>     if (!found)
>     {
>         missedCount++;
>         System.*out*.println(propertyName);
>     }
> }
> System.*out*.println("Total missed:" + missedCount);
>
>
> It prints the following config property names which are defined in 
> Config.java but not present as "property" or "# property " in a file:
>
> permissions_cache_max_entries
> roles_cache_max_entries
> credentials_cache_max_entries
> auto_bootstrap
> force_new_prepared_statement_behaviour
> use_deterministic_table_id
> repair_request_timeout
> stream_transfer_task_timeout
> cms_await_timeout
> cms_default_max_retries
> cms_default_retry_backoff
> epoch_aware_debounce_inflight_tracker_max_size
> metadata_snapshot_frequency
> available_processors
> repair_session_max_tree_depth
> use_offheap_merkle_trees
> internode_max_message_size
> native_transport_max_message_size
> native_transport_max_request_data_in_flight_per_ip
> native_transport_max_request_data_in_flight
> native_transport_receive_queue_capacity
> min_free_space_per_drive
> max_space_usable_for_compactions_in_percentage
> reject_repair_compaction_threshold
> concurrent_index_builders
> max_streaming_retries
> commitlog_max_compression_buffers_in_pool
> max_mutation_size
> dynamic_snitch
> failure_detector
> use_creation_time_for_hint_ttl
> key_cache_migrate_during_compaction
> key_cache_invalidate_after_sstable_deletion
> paxos_cache_size
> file_cache_round_up
> disk_optimization_estimate_percentile
> disk_optimization_page_cross_chance
> purgeable_tobmstones_metric_granularity
> windows_timer_interval
> otc_coalescing_strategy
> otc_coalescing_window_us
> otc_coalescing_enough_coalesced_messages
> otc_backlog_expiration_interval_ms
> scripted_user_defined_functions_enabled
> user_defined_functions_threads_enabled
> allow_insecure_udfs
> allow_extra_insecure_udfs
> user_defined_functions_warn_timeout
> user_defined_functions_fail_timeout
> user_function_timeout_policy
> back_pressure_enabled
> back_pressure_strategy
> repair_command_pool_full_strategy
> repair_command_pool_size
> block_for_peers_timeout_in_secs
> block_for_peers_in_remote_dcs
> skip_stream_disk_space_check
> snapshot_on_repaired_data_mismatch
> validation_preview_purge_head_start
> initial_range_tombstone_list_allocation_size
> range_tombstone_list_growth_factor
> snapshot_on_duplicate_row_detection
> check_for_duplicate_rows_during_reads
> check_for_duplicate_rows_during_compaction
> autocompaction_on_startup_enabled
> auto_optimise_inc_repair_streams
> auto_optimise_full_repair_streams
> auto_optimise_preview_repair_streams
> consecutive_message_errors_threshold
> internode_error_reporting_exclusions
> compact_tables_enabled
> vector_type_enabled
> intersect_filtering_query_warned
> intersect_filtering_query_enabled
> streaming_slow_events_log_timeout
> repair_state_expires
> repair_state_size
> paxos_variant
> skip_paxos_repair_on_topology_change
> paxos_purge_grace_period
> paxos_on_linearizability_violations
> paxos_state_purging
> paxos_repair_enabled
> paxos_topology_repair_no_dc_checks
> paxos_topology_repair_strict_each_quorum
> skip_paxos_repair_on_topology_change_keyspaces
> paxos_contention_wait_randomizer
> paxos_contention_min_wait
> paxos_contention_max_wait
> paxos_contention_min_delta
> paxos_repair_parallelism
> sstable_read_rate_persistence_enabled
> client_request_size_metrics_enabled
> max_top_size_partition_count
> max_top_tombstone_partition_count
> min_tracked_partition_size
> min_tracked_partition_tombstone_count
> top_partitions_enabled
> severity_during_decommission
> progress_barrier_min_consistency_level
> progress_barrier_default_consistency_level
> progress_barrier_timeout
> progress_barrier_backoff
> discovery_timeout
> unsafe_tcm_mode
> cql_start_time
> native_transport_throw_on_overload
> native_transport_queue_max_item_age_threshold
> native_transport_min_backoff_on_queue_overload
> native_transport_max_backoff_on_queue_overload
> native_transport_timeout
> enforce_native_deadline_for_hints
> Total missed:112
>
>
>
> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> It should also work the other way around. If there is a property which is
> commented out in yaml and it is not in Config.java, that should fail as
> well. If it is not commented out and it is not in Config.java, that will
> fail in runtime as it fails on unrecognized property.
>
> This will be used in practice very rarely as we seldom remove the
> properties in Config but if we do and a property is commented out, we
> should not ship a dead property name, even commented out.
>
> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote:
>
> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might
> indeed add it, also commented out. I think it would be quite easy to check
> against yaml if there is a line starting on "# my_cool_property" or just on
> "my_cool_property". Both cases would satisfy the check.
>
> Makes sense, I think this would be good to have as a lint or test to
> easily catch overlooks during review.
>
> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
>
>
> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote:
>
> > from time to time I see configuration properties in Config.java and they
> are clearly not in cassandra.yaml. Not every property in Config is in
> cassandra.yaml. I would like to know if there is some specific reason
> behind that.
>
> I think one of the original reasons was to "hide" advanced configs that
> are not meant to be updated, unless in very niche circumstances. However I
> think this has been extrapolated to non-advanced settings.
>
> > Question related to that is if we could not have a build-time check that
> all properties in Config have to be in cassandra.yaml and fail the build if
> a property in Config does not have its counterpart in yaml.
>
> Are you saying every configuration property should be commented-out, or do
> you think that every Config property should be specified in cassandra.yaml
> with their default uncomented ? One issue with that is that you could cause
> user confusion if you "reveal" a niche/advanced config that is not meant to
> be updated. I think this would be addressed by the @HiddenInYaml flag you
> are proposing in a later post.
>
>
> Yes, then can stay hidden, but we should annotate it with @Hidden or
> similar. As of now, if that property is not in yaml, we just don't know if
> it was forgotten to be added or if we have not added it on purpose.
>
> They can keep being commented out if they currently are. Imagine a
> property in Config.java
>
> public boolean my_cool_property = true;
>
> and then this in cassandra.yaml
>
> # my_cool_property: true
>
> It is completely ok.
>
> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed
> add it, also commented out. I think it would be quite easy to check against
> yaml if there is a line starting on "# my_cool_property" or just on
> "my_cool_property". Both cases would satisfy the check.
>
>
>
> > There are dozens of properties in Config and I have a strong suspicion
> that we missed to publish some to yaml so users do not even know such a
> property exists and as of now we do not even know which they are.
>
> I believe this is a problem. I think most properties should be in
> cassandra.yaml, unless they are very advanced or not meant to be updated.
>
> Another tangential issue is that there are features/settings that don't
> even have a Config entry, but are just controlled by JVM properties.
>
> I think that we should attempt to unify Config and jvm properties under a
> predictable structure. For example, if there is a YAML config
> enable_user_defined_functions, then there should be a respective JVM flag
> -Dcassandra.enable_user_defined_functions, and vice versa.
>
>
> Yeah, good idea.
>
>
>
> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> Hello,
>
> from time to time I see configuration properties in Config.java and they
> are clearly not in cassandra.yaml. Not every property in Config is in
> cassandra.yaml. I would like to know if there is some specific reason
> behind that.
>
> Question related to that is if we could not have a build-time check that
> all properties in Config have to be in cassandra.yaml and fail the build if
> a property in Config does not have its counterpart in yaml.
>
> There are dozens of properties in Config and I have a strong suspicion
> that we missed to publish some to yaml so users do not even know such a
> property exists and as of now we do not even know which they are.
>
>
>
> --
> Dmitry Konstantinov
>
>
>
> --
> Dmitry Konstantinov
>
>
>

Reply via email to