Indeed, we need to balance that and thoughtfully choose what is going to be
added and what not. However, we should not hide something which is meant to
be tweaked by a user. The config is intimidating mostly because everything
is just in one file. I merely remember discussions a few years ago which
were about splitting cassandra.yaml into multiple files which would be
focused just on one subsystem / would cover some logically isolated domain.

Anyway, I think the main goal of this effort for now would be to at least
map where we are at. Some of them are genuinely missing. E.g. guardrails,
how is a user meant to know about that if it is not even documented ...

On Mon, Jan 27, 2025 at 6:16 PM Chris lohfink <cnl...@gmail.com> wrote:

> Might be a bit of a balance between exposing what people actually are
> likely to need to modify vs having a super intimidating config file. It's
> already nearly 2000 lines. Personally I'd rather see some
> auto-documentation or something that's in the docs
> <https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html>
> than an effort to manually add another 1000 lines.
>
> Chris
>
> On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
>> Maybe I missed some patterns but it looks like a pretty good estimation,
>> I did like 10 random checks manually to verify :-)
>> I will try to make an ant target with a similar logic (hopefully, during
>> the weekend)
>> I will create a ticket to track this activity (to share attachments there
>> to not overload the thread with such outputs in future).
>>
>> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>>
>>> Anyway, I think we need to integrate this to some ant target. If you
>>> expanded on this, that would be great.
>>>
>>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com>
>>> wrote:
>>>
>>>> A very primitive implementation of the 1st idea below:
>>>>
>>>> String configUrl = 
>>>> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
>>>> Field[] allFields = Config.class.getFields();
>>>> List<String> topLevelPropertyNames = new ArrayList<>();
>>>> for(Field field : allFields)
>>>> {
>>>>     if (!Modifier.isStatic(field.getModifiers()))
>>>>     {
>>>>         topLevelPropertyNames.add(field.getName());
>>>>     }
>>>> }
>>>>
>>>> URL url = new URL(configUrl);
>>>> List<String> lines = Files.readAllLines(Paths.get(url.toURI()));
>>>>
>>>> int missedCount = 0;
>>>> for (String propertyName : topLevelPropertyNames)
>>>> {
>>>>     boolean found = false;
>>>>     for (String line : lines)
>>>>     {
>>>>         if (line.startsWith(propertyName + ":")
>>>>             || line.startsWith("#" + propertyName + ":")
>>>>             || line.startsWith("# " + propertyName + ":")) {
>>>>             found = true;
>>>>             break;
>>>>         }
>>>>     }
>>>>     if (!found)
>>>>     {
>>>>         missedCount++;
>>>>         System.out.println(propertyName);
>>>>     }
>>>> }
>>>> System.out.println("Total missed:" + missedCount);
>>>>
>>>>
>>>> It prints the following config property names which are defined in 
>>>> Config.java but not present as "property" or "# property " in a file:
>>>>
>>>> permissions_cache_max_entries
>>>> roles_cache_max_entries
>>>> credentials_cache_max_entries
>>>> auto_bootstrap
>>>> force_new_prepared_statement_behaviour
>>>> use_deterministic_table_id
>>>> repair_request_timeout
>>>> stream_transfer_task_timeout
>>>> cms_await_timeout
>>>> cms_default_max_retries
>>>> cms_default_retry_backoff
>>>> epoch_aware_debounce_inflight_tracker_max_size
>>>> metadata_snapshot_frequency
>>>> available_processors
>>>> repair_session_max_tree_depth
>>>> use_offheap_merkle_trees
>>>> internode_max_message_size
>>>> native_transport_max_message_size
>>>> native_transport_max_request_data_in_flight_per_ip
>>>> native_transport_max_request_data_in_flight
>>>> native_transport_receive_queue_capacity
>>>> min_free_space_per_drive
>>>> max_space_usable_for_compactions_in_percentage
>>>> reject_repair_compaction_threshold
>>>> concurrent_index_builders
>>>> max_streaming_retries
>>>> commitlog_max_compression_buffers_in_pool
>>>> max_mutation_size
>>>> dynamic_snitch
>>>> failure_detector
>>>> use_creation_time_for_hint_ttl
>>>> key_cache_migrate_during_compaction
>>>> key_cache_invalidate_after_sstable_deletion
>>>> paxos_cache_size
>>>> file_cache_round_up
>>>> disk_optimization_estimate_percentile
>>>> disk_optimization_page_cross_chance
>>>> purgeable_tobmstones_metric_granularity
>>>> windows_timer_interval
>>>> otc_coalescing_strategy
>>>> otc_coalescing_window_us
>>>> otc_coalescing_enough_coalesced_messages
>>>> otc_backlog_expiration_interval_ms
>>>> scripted_user_defined_functions_enabled
>>>> user_defined_functions_threads_enabled
>>>> allow_insecure_udfs
>>>> allow_extra_insecure_udfs
>>>> user_defined_functions_warn_timeout
>>>> user_defined_functions_fail_timeout
>>>> user_function_timeout_policy
>>>> back_pressure_enabled
>>>> back_pressure_strategy
>>>> repair_command_pool_full_strategy
>>>> repair_command_pool_size
>>>> block_for_peers_timeout_in_secs
>>>> block_for_peers_in_remote_dcs
>>>> skip_stream_disk_space_check
>>>> snapshot_on_repaired_data_mismatch
>>>> validation_preview_purge_head_start
>>>> initial_range_tombstone_list_allocation_size
>>>> range_tombstone_list_growth_factor
>>>> snapshot_on_duplicate_row_detection
>>>> check_for_duplicate_rows_during_reads
>>>> check_for_duplicate_rows_during_compaction
>>>> autocompaction_on_startup_enabled
>>>> auto_optimise_inc_repair_streams
>>>> auto_optimise_full_repair_streams
>>>> auto_optimise_preview_repair_streams
>>>> consecutive_message_errors_threshold
>>>> internode_error_reporting_exclusions
>>>> compact_tables_enabled
>>>> vector_type_enabled
>>>> intersect_filtering_query_warned
>>>> intersect_filtering_query_enabled
>>>> streaming_slow_events_log_timeout
>>>> repair_state_expires
>>>> repair_state_size
>>>> paxos_variant
>>>> skip_paxos_repair_on_topology_change
>>>> paxos_purge_grace_period
>>>> paxos_on_linearizability_violations
>>>> paxos_state_purging
>>>> paxos_repair_enabled
>>>> paxos_topology_repair_no_dc_checks
>>>> paxos_topology_repair_strict_each_quorum
>>>> skip_paxos_repair_on_topology_change_keyspaces
>>>> paxos_contention_wait_randomizer
>>>> paxos_contention_min_wait
>>>> paxos_contention_max_wait
>>>> paxos_contention_min_delta
>>>> paxos_repair_parallelism
>>>> sstable_read_rate_persistence_enabled
>>>> client_request_size_metrics_enabled
>>>> max_top_size_partition_count
>>>> max_top_tombstone_partition_count
>>>> min_tracked_partition_size
>>>> min_tracked_partition_tombstone_count
>>>> top_partitions_enabled
>>>> severity_during_decommission
>>>> progress_barrier_min_consistency_level
>>>> progress_barrier_default_consistency_level
>>>> progress_barrier_timeout
>>>> progress_barrier_backoff
>>>> discovery_timeout
>>>> unsafe_tcm_mode
>>>> cql_start_time
>>>> native_transport_throw_on_overload
>>>> native_transport_queue_max_item_age_threshold
>>>> native_transport_min_backoff_on_queue_overload
>>>> native_transport_max_backoff_on_queue_overload
>>>> native_transport_timeout
>>>> enforce_native_deadline_for_hints
>>>> Total missed:112
>>>>
>>>>
>>>>
>>>> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org>
>>>> wrote:
>>>>
>>>>> It should also work the other way around. If there is a property which
>>>>> is commented out in yaml and it is not in Config.java, that should fail as
>>>>> well. If it is not commented out and it is not in Config.java, that will
>>>>> fail in runtime as it fails on unrecognized property.
>>>>>
>>>>> This will be used in practice very rarely as we seldom remove the
>>>>> properties in Config but if we do and a property is commented out, we
>>>>> should not ship a dead property name, even commented out.
>>>>>
>>>>> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote:
>>>>>
>>>>>> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we
>>>>>> might indeed add it, also commented out. I think it would be quite easy 
>>>>>> to
>>>>>> check against yaml if there is a line starting on "# my_cool_property" or
>>>>>> just on "my_cool_property". Both cases would satisfy the check.
>>>>>>
>>>>>> Makes sense, I think this would be good to have as a lint or test to
>>>>>> easily catch overlooks during review.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <
>>>>>> smikloso...@apache.org> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > from time to time I see configuration properties in Config.java
>>>>>>>> and they are clearly not in cassandra.yaml. Not every property in 
>>>>>>>> Config is
>>>>>>>> in cassandra.yaml. I would like to know if there is some specific 
>>>>>>>> reason
>>>>>>>> behind that.
>>>>>>>>
>>>>>>>> I think one of the original reasons was to "hide" advanced configs
>>>>>>>> that are not meant to be updated, unless in very niche circumstances.
>>>>>>>> However I think this has been extrapolated to non-advanced settings.
>>>>>>>>
>>>>>>>> > Question related to that is if we could not have a build-time
>>>>>>>> check that all properties in Config have to be in cassandra.yaml and 
>>>>>>>> fail
>>>>>>>> the build if a property in Config does not have its counterpart in 
>>>>>>>> yaml.
>>>>>>>>
>>>>>>>> Are you saying every configuration property should be
>>>>>>>> commented-out, or do you think that every Config property should be
>>>>>>>> specified in cassandra.yaml with their default uncomented ? One issue 
>>>>>>>> with
>>>>>>>> that is that you could cause user confusion if you "reveal" a
>>>>>>>> niche/advanced config that is not meant to be updated. I think this 
>>>>>>>> would
>>>>>>>> be addressed by the @HiddenInYaml flag you are proposing in a later 
>>>>>>>> post.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, then can stay hidden, but we should annotate it with @Hidden or
>>>>>>> similar. As of now, if that property is not in yaml, we just don't know 
>>>>>>> if
>>>>>>> it was forgotten to be added or if we have not added it on purpose.
>>>>>>>
>>>>>>> They can keep being commented out if they currently are. Imagine a
>>>>>>> property in Config.java
>>>>>>>
>>>>>>> public boolean my_cool_property = true;
>>>>>>>
>>>>>>> and then this in cassandra.yaml
>>>>>>>
>>>>>>> # my_cool_property: true
>>>>>>>
>>>>>>> It is completely ok.
>>>>>>>
>>>>>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might
>>>>>>> indeed add it, also commented out. I think it would be quite easy to 
>>>>>>> check
>>>>>>> against yaml if there is a line starting on "# my_cool_property" or 
>>>>>>> just on
>>>>>>> "my_cool_property". Both cases would satisfy the check.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> > There are dozens of properties in Config and I have a strong
>>>>>>>> suspicion that we missed to publish some to yaml so users do not even 
>>>>>>>> know
>>>>>>>> such a property exists and as of now we do not even know which they 
>>>>>>>> are.
>>>>>>>>
>>>>>>>> I believe this is a problem. I think most properties should be in
>>>>>>>> cassandra.yaml, unless they are very advanced or not meant to be 
>>>>>>>> updated.
>>>>>>>>
>>>>>>>> Another tangential issue is that there are features/settings that
>>>>>>>> don't even have a Config entry, but are just controlled by JVM 
>>>>>>>> properties.
>>>>>>>>
>>>>>>>> I think that we should attempt to unify Config and jvm properties
>>>>>>>> under a predictable structure. For example, if there is a YAML config
>>>>>>>> enable_user_defined_functions, then there should be a respective JVM 
>>>>>>>> flag
>>>>>>>> -Dcassandra.enable_user_defined_functions, and vice versa.
>>>>>>>>
>>>>>>>
>>>>>>> Yeah, good idea.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <
>>>>>>>> smikloso...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> from time to time I see configuration properties in Config.java
>>>>>>>>> and they are clearly not in cassandra.yaml. Not every property in 
>>>>>>>>> Config is
>>>>>>>>> in cassandra.yaml. I would like to know if there is some specific 
>>>>>>>>> reason
>>>>>>>>> behind that.
>>>>>>>>>
>>>>>>>>> Question related to that is if we could not have a build-time
>>>>>>>>> check that all properties in Config have to be in cassandra.yaml and 
>>>>>>>>> fail
>>>>>>>>> the build if a property in Config does not have its counterpart in 
>>>>>>>>> yaml.
>>>>>>>>>
>>>>>>>>> There are dozens of properties in Config and I have a strong
>>>>>>>>> suspicion that we missed to publish some to yaml so users do not even 
>>>>>>>>> know
>>>>>>>>> such a property exists and as of now we do not even know which they 
>>>>>>>>> are.
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Dmitry Konstantinov
>>>>
>>>
>>
>> --
>> Dmitry Konstantinov
>>
>

Reply via email to