>>
https://docs.google.com/spreadsheets/d/11MOxhNqwE1tWP4ex2gzKG2pmeAWFaHDKo-CRp25h9BU/edit?gid=0#gid=0
We still have a lot of rows empty. I have added many default values and a
Cassandra version when a parameter was introduced (to differentiate some
recent parameters from old ones) based on source code but it would be nice
to get a description for parameters from the authors as well as
classification exposed/hidden.
Maybe we should not wait for collecting info about all parameters and
update what we have + use a threshold in the Ant validation task to fail
when new missed parameters are added. The logic in the dev branch here
https://issues.apache.org/jira/browse/CASSANDRA-20249 already supports a
threshold.


On Tue, 28 Jan 2025 at 00:04, Josh McKenzie <jmcken...@apache.org> wrote:

> Good point re: the implications of parsing and durability in the face of
> seeing unknown or missing parameters. I don't think widening the scope on
> that would be ideal, especially considering the entire impetus for this
> conversation is "we've misbehaved with our config and have a bunch of
> undocumented stuff we're not sure is still useful, or what it's for". =/
>
> On Mon, Jan 27, 2025, at 3:41 PM, Štefan Miklošovič wrote:
>
> "we take "unclaimed" items and move them to their own InternalConfig.java
> or something"
>
> This is interesting. If we are meant to be still able to put these
> properties into cassandra.yaml (even they are "internal ones") and they
> would be just in InternalConfig.java for some basic separation of internal
> / user-facing configuration, then we would need to have two yaml loaders:
>
> 1) the one as we have now which loads cassandra.yaml it into Config.java
> 2) the second one which would load cassandra.yaml into InternalConfig.java
>
> For both cases, we could not fail when there are unrecognized properties
> in cassandra.yaml while parsing it (1), because every loader, for
> Config.java as well as InternalConfig.java, is parsing just some "subset"
> of yaml.
>
> (1)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java#L443-L444
>
> If we just had "public InternalConfig internal = new InternalConfig" as a
> field in Config.java, then this would lead to properties being effectively
> renamed in cassandra.yaml like
>
> internal:
>     some_currently_internal_property: false
>
> instead of just
>
> some_currently_internal_property: false
>
> I do not think we want to have them renamed / under different
> configuration sections in yaml. I get that they are "internal" etc but we
> just don't know how / where it is used and deployed and just blindly
> renaming them is not a good idea imho.
>
> On Mon, Jan 27, 2025 at 8:46 PM Josh McKenzie <jmcken...@apache.org>
> wrote:
>
>
> This may be an off-base comparison, but this reminds me of struggles we've
> had getting to 0 failing unit tests before and the debates on fencing off a
> snapshot of the current "failure set" so you can have a set point where no
> further degradation is allowed in a primary data set.
>
> All of which is to say - maybe at the end of the spreadsheet, we take
> "unclaimed" items and move them to their own InternalConfig.java or
> something and add an ant target that a) disallows further addition to
> InternalConfig.java w/out throwing an error / needing whitelist update, and
> b) disallows further regression in the Config.java <-> cassandra.yaml
> relationship for non-annotated fields.
>
> That way we can at least halt the progression of the disease even if we're
> stymied on cleaning up some of the existing symptoms.
>
> On Mon, Jan 27, 2025, at 1:38 PM, Štefan Miklošovič wrote:
>
> Indeed, we need to balance that and thoughtfully choose what is going to
> be added and what not. However, we should not hide something which is meant
> to be tweaked by a user. The config is intimidating mostly because
> everything is just in one file. I merely remember discussions a few years
> ago which were about splitting cassandra.yaml into multiple files which
> would be focused just on one subsystem / would cover some logically
> isolated domain.
>
> Anyway, I think the main goal of this effort for now would be to at least
> map where we are at. Some of them are genuinely missing. E.g. guardrails,
> how is a user meant to know about that if it is not even documented ...
>
> On Mon, Jan 27, 2025 at 6:16 PM Chris lohfink <cnl...@gmail.com> wrote:
>
> Might be a bit of a balance between exposing what people actually are
> likely to need to modify vs having a super intimidating config file. It's
> already nearly 2000 lines. Personally I'd rather see some
> auto-documentation or something that's in the docs
> <https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html>
> than an effort to manually add another 1000 lines.
>
> Chris
>
> On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
> Maybe I missed some patterns but it looks like a pretty good estimation, I
> did like 10 random checks manually to verify :-)
> I will try to make an ant target with a similar logic (hopefully, during
> the weekend)
> I will create a ticket to track this activity (to share attachments there
> to not overload the thread with such outputs in future).
>
> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> Oh my god, 112? :DD I was thinking it would be less than 10.
>
> Anyway, I think we need to integrate this to some ant target. If you
> expanded on this, that would be great.
>
> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
> A very primitive implementation of the 1st idea below:
>
> String configUrl = 
> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
> Field[] allFields = Config.class.getFields();
> List<String> topLevelPropertyNames = new ArrayList<>();
> for(Field field : allFields)
> {
>     if (!Modifier.*isStatic*(field.getModifiers()))
>     {
>         topLevelPropertyNames.add(field.getName());
>     }
> }
>
> URL url = new URL(configUrl);
> List<String> lines = Files.*readAllLines*(Paths.*get*(url.toURI()));
>
> int missedCount = 0;
> for (String propertyName : topLevelPropertyNames)
> {
>     boolean found = false;
>     for (String line : lines)
>     {
>         if (line.startsWith(propertyName + ":")
>             || line.startsWith("#" + propertyName + ":")
>             || line.startsWith("# " + propertyName + ":")) {
>             found = true;
>             break;
>         }
>     }
>     if (!found)
>     {
>         missedCount++;
>         System.*out*.println(propertyName);
>     }
> }
> System.*out*.println("Total missed:" + missedCount);
>
>
> It prints the following config property names which are defined in 
> Config.java but not present as "property" or "# property " in a file:
>
> permissions_cache_max_entries
> roles_cache_max_entries
> credentials_cache_max_entries
> auto_bootstrap
> force_new_prepared_statement_behaviour
> use_deterministic_table_id
> repair_request_timeout
> stream_transfer_task_timeout
> cms_await_timeout
> cms_default_max_retries
> cms_default_retry_backoff
> epoch_aware_debounce_inflight_tracker_max_size
> metadata_snapshot_frequency
> available_processors
> repair_session_max_tree_depth
> use_offheap_merkle_trees
> internode_max_message_size
> native_transport_max_message_size
> native_transport_max_request_data_in_flight_per_ip
> native_transport_max_request_data_in_flight
> native_transport_receive_queue_capacity
> min_free_space_per_drive
> max_space_usable_for_compactions_in_percentage
> reject_repair_compaction_threshold
> concurrent_index_builders
> max_streaming_retries
> commitlog_max_compression_buffers_in_pool
> max_mutation_size
> dynamic_snitch
> failure_detector
> use_creation_time_for_hint_ttl
> key_cache_migrate_during_compaction
> key_cache_invalidate_after_sstable_deletion
> paxos_cache_size
> file_cache_round_up
> disk_optimization_estimate_percentile
> disk_optimization_page_cross_chance
> purgeable_tobmstones_metric_granularity
> windows_timer_interval
> otc_coalescing_strategy
> otc_coalescing_window_us
> otc_coalescing_enough_coalesced_messages
> otc_backlog_expiration_interval_ms
> scripted_user_defined_functions_enabled
> user_defined_functions_threads_enabled
> allow_insecure_udfs
> allow_extra_insecure_udfs
> user_defined_functions_warn_timeout
> user_defined_functions_fail_timeout
> user_function_timeout_policy
> back_pressure_enabled
> back_pressure_strategy
> repair_command_pool_full_strategy
> repair_command_pool_size
> block_for_peers_timeout_in_secs
> block_for_peers_in_remote_dcs
> skip_stream_disk_space_check
> snapshot_on_repaired_data_mismatch
> validation_preview_purge_head_start
> initial_range_tombstone_list_allocation_size
> range_tombstone_list_growth_factor
> snapshot_on_duplicate_row_detection
> check_for_duplicate_rows_during_reads
> check_for_duplicate_rows_during_compaction
> autocompaction_on_startup_enabled
> auto_optimise_inc_repair_streams
> auto_optimise_full_repair_streams
> auto_optimise_preview_repair_streams
> consecutive_message_errors_threshold
> internode_error_reporting_exclusions
> compact_tables_enabled
> vector_type_enabled
> intersect_filtering_query_warned
> intersect_filtering_query_enabled
> streaming_slow_events_log_timeout
> repair_state_expires
> repair_state_size
> paxos_variant
> skip_paxos_repair_on_topology_change
> paxos_purge_grace_period
> paxos_on_linearizability_violations
> paxos_state_purging
> paxos_repair_enabled
> paxos_topology_repair_no_dc_checks
> paxos_topology_repair_strict_each_quorum
> skip_paxos_repair_on_topology_change_keyspaces
> paxos_contention_wait_randomizer
> paxos_contention_min_wait
> paxos_contention_max_wait
> paxos_contention_min_delta
> paxos_repair_parallelism
> sstable_read_rate_persistence_enabled
> client_request_size_metrics_enabled
> max_top_size_partition_count
> max_top_tombstone_partition_count
> min_tracked_partition_size
> min_tracked_partition_tombstone_count
> top_partitions_enabled
> severity_during_decommission
> progress_barrier_min_consistency_level
> progress_barrier_default_consistency_level
> progress_barrier_timeout
> progress_barrier_backoff
> discovery_timeout
> unsafe_tcm_mode
> cql_start_time
> native_transport_throw_on_overload
> native_transport_queue_max_item_age_threshold
> native_transport_min_backoff_on_queue_overload
> native_transport_max_backoff_on_queue_overload
> native_transport_timeout
> enforce_native_deadline_for_hints
> Total missed:112
>
>
>
> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> It should also work the other way around. If there is a property which is
> commented out in yaml and it is not in Config.java, that should fail as
> well. If it is not commented out and it is not in Config.java, that will
> fail in runtime as it fails on unrecognized property.
>
> This will be used in practice very rarely as we seldom remove the
> properties in Config but if we do and a property is commented out, we
> should not ship a dead property name, even commented out.
>
> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote:
>
> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might
> indeed add it, also commented out. I think it would be quite easy to check
> against yaml if there is a line starting on "# my_cool_property" or just on
> "my_cool_property". Both cases would satisfy the check.
>
> Makes sense, I think this would be good to have as a lint or test to
> easily catch overlooks during review.
>
> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
>
>
> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote:
>
> > from time to time I see configuration properties in Config.java and they
> are clearly not in cassandra.yaml. Not every property in Config is in
> cassandra.yaml. I would like to know if there is some specific reason
> behind that.
>
> I think one of the original reasons was to "hide" advanced configs that
> are not meant to be updated, unless in very niche circumstances. However I
> think this has been extrapolated to non-advanced settings.
>
> > Question related to that is if we could not have a build-time check that
> all properties in Config have to be in cassandra.yaml and fail the build if
> a property in Config does not have its counterpart in yaml.
>
> Are you saying every configuration property should be commented-out, or do
> you think that every Config property should be specified in cassandra.yaml
> with their default uncomented ? One issue with that is that you could cause
> user confusion if you "reveal" a niche/advanced config that is not meant to
> be updated. I think this would be addressed by the @HiddenInYaml flag you
> are proposing in a later post.
>
>
> Yes, then can stay hidden, but we should annotate it with @Hidden or
> similar. As of now, if that property is not in yaml, we just don't know if
> it was forgotten to be added or if we have not added it on purpose.
>
> They can keep being commented out if they currently are. Imagine a
> property in Config.java
>
> public boolean my_cool_property = true;
>
> and then this in cassandra.yaml
>
> # my_cool_property: true
>
> It is completely ok.
>
> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed
> add it, also commented out. I think it would be quite easy to check against
> yaml if there is a line starting on "# my_cool_property" or just on
> "my_cool_property". Both cases would satisfy the check.
>
>
>
> > There are dozens of properties in Config and I have a strong suspicion
> that we missed to publish some to yaml so users do not even know such a
> property exists and as of now we do not even know which they are.
>
> I believe this is a problem. I think most properties should be in
> cassandra.yaml, unless they are very advanced or not meant to be updated.
>
> Another tangential issue is that there are features/settings that don't
> even have a Config entry, but are just controlled by JVM properties.
>
> I think that we should attempt to unify Config and jvm properties under a
> predictable structure. For example, if there is a YAML config
> enable_user_defined_functions, then there should be a respective JVM flag
> -Dcassandra.enable_user_defined_functions, and vice versa.
>
>
> Yeah, good idea.
>
>
>
> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
> Hello,
>
> from time to time I see configuration properties in Config.java and they
> are clearly not in cassandra.yaml. Not every property in Config is in
> cassandra.yaml. I would like to know if there is some specific reason
> behind that.
>
> Question related to that is if we could not have a build-time check that
> all properties in Config have to be in cassandra.yaml and fail the build if
> a property in Config does not have its counterpart in yaml.
>
> There are dozens of properties in Config and I have a strong suspicion
> that we missed to publish some to yaml so users do not even know such a
> property exists and as of now we do not even know which they are.
>
>
>
> --
> Dmitry Konstantinov
>
>
>
> --
> Dmitry Konstantinov
>
>
>
>

-- 
Dmitry Konstantinov

Reply via email to