Might be a bit of a balance between exposing what people actually are
likely to need to modify vs having a super intimidating config file. It's
already nearly 2000 lines. Personally I'd rather see some
auto-documentation or something that's in the docs
<https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html>
than an effort to manually add another 1000 lines.

Chris

On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <netud...@gmail.com>
wrote:

> Maybe I missed some patterns but it looks like a pretty good estimation, I
> did like 10 random checks manually to verify :-)
> I will try to make an ant target with a similar logic (hopefully, during
> the weekend)
> I will create a ticket to track this activity (to share attachments there
> to not overload the thread with such outputs in future).
>
> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org>
> wrote:
>
>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>
>> Anyway, I think we need to integrate this to some ant target. If you
>> expanded on this, that would be great.
>>
>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com>
>> wrote:
>>
>>> A very primitive implementation of the 1st idea below:
>>>
>>> String configUrl = 
>>> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
>>> Field[] allFields = Config.class.getFields();
>>> List<String> topLevelPropertyNames = new ArrayList<>();
>>> for(Field field : allFields)
>>> {
>>>     if (!Modifier.isStatic(field.getModifiers()))
>>>     {
>>>         topLevelPropertyNames.add(field.getName());
>>>     }
>>> }
>>>
>>> URL url = new URL(configUrl);
>>> List<String> lines = Files.readAllLines(Paths.get(url.toURI()));
>>>
>>> int missedCount = 0;
>>> for (String propertyName : topLevelPropertyNames)
>>> {
>>>     boolean found = false;
>>>     for (String line : lines)
>>>     {
>>>         if (line.startsWith(propertyName + ":")
>>>             || line.startsWith("#" + propertyName + ":")
>>>             || line.startsWith("# " + propertyName + ":")) {
>>>             found = true;
>>>             break;
>>>         }
>>>     }
>>>     if (!found)
>>>     {
>>>         missedCount++;
>>>         System.out.println(propertyName);
>>>     }
>>> }
>>> System.out.println("Total missed:" + missedCount);
>>>
>>>
>>> It prints the following config property names which are defined in 
>>> Config.java but not present as "property" or "# property " in a file:
>>>
>>> permissions_cache_max_entries
>>> roles_cache_max_entries
>>> credentials_cache_max_entries
>>> auto_bootstrap
>>> force_new_prepared_statement_behaviour
>>> use_deterministic_table_id
>>> repair_request_timeout
>>> stream_transfer_task_timeout
>>> cms_await_timeout
>>> cms_default_max_retries
>>> cms_default_retry_backoff
>>> epoch_aware_debounce_inflight_tracker_max_size
>>> metadata_snapshot_frequency
>>> available_processors
>>> repair_session_max_tree_depth
>>> use_offheap_merkle_trees
>>> internode_max_message_size
>>> native_transport_max_message_size
>>> native_transport_max_request_data_in_flight_per_ip
>>> native_transport_max_request_data_in_flight
>>> native_transport_receive_queue_capacity
>>> min_free_space_per_drive
>>> max_space_usable_for_compactions_in_percentage
>>> reject_repair_compaction_threshold
>>> concurrent_index_builders
>>> max_streaming_retries
>>> commitlog_max_compression_buffers_in_pool
>>> max_mutation_size
>>> dynamic_snitch
>>> failure_detector
>>> use_creation_time_for_hint_ttl
>>> key_cache_migrate_during_compaction
>>> key_cache_invalidate_after_sstable_deletion
>>> paxos_cache_size
>>> file_cache_round_up
>>> disk_optimization_estimate_percentile
>>> disk_optimization_page_cross_chance
>>> purgeable_tobmstones_metric_granularity
>>> windows_timer_interval
>>> otc_coalescing_strategy
>>> otc_coalescing_window_us
>>> otc_coalescing_enough_coalesced_messages
>>> otc_backlog_expiration_interval_ms
>>> scripted_user_defined_functions_enabled
>>> user_defined_functions_threads_enabled
>>> allow_insecure_udfs
>>> allow_extra_insecure_udfs
>>> user_defined_functions_warn_timeout
>>> user_defined_functions_fail_timeout
>>> user_function_timeout_policy
>>> back_pressure_enabled
>>> back_pressure_strategy
>>> repair_command_pool_full_strategy
>>> repair_command_pool_size
>>> block_for_peers_timeout_in_secs
>>> block_for_peers_in_remote_dcs
>>> skip_stream_disk_space_check
>>> snapshot_on_repaired_data_mismatch
>>> validation_preview_purge_head_start
>>> initial_range_tombstone_list_allocation_size
>>> range_tombstone_list_growth_factor
>>> snapshot_on_duplicate_row_detection
>>> check_for_duplicate_rows_during_reads
>>> check_for_duplicate_rows_during_compaction
>>> autocompaction_on_startup_enabled
>>> auto_optimise_inc_repair_streams
>>> auto_optimise_full_repair_streams
>>> auto_optimise_preview_repair_streams
>>> consecutive_message_errors_threshold
>>> internode_error_reporting_exclusions
>>> compact_tables_enabled
>>> vector_type_enabled
>>> intersect_filtering_query_warned
>>> intersect_filtering_query_enabled
>>> streaming_slow_events_log_timeout
>>> repair_state_expires
>>> repair_state_size
>>> paxos_variant
>>> skip_paxos_repair_on_topology_change
>>> paxos_purge_grace_period
>>> paxos_on_linearizability_violations
>>> paxos_state_purging
>>> paxos_repair_enabled
>>> paxos_topology_repair_no_dc_checks
>>> paxos_topology_repair_strict_each_quorum
>>> skip_paxos_repair_on_topology_change_keyspaces
>>> paxos_contention_wait_randomizer
>>> paxos_contention_min_wait
>>> paxos_contention_max_wait
>>> paxos_contention_min_delta
>>> paxos_repair_parallelism
>>> sstable_read_rate_persistence_enabled
>>> client_request_size_metrics_enabled
>>> max_top_size_partition_count
>>> max_top_tombstone_partition_count
>>> min_tracked_partition_size
>>> min_tracked_partition_tombstone_count
>>> top_partitions_enabled
>>> severity_during_decommission
>>> progress_barrier_min_consistency_level
>>> progress_barrier_default_consistency_level
>>> progress_barrier_timeout
>>> progress_barrier_backoff
>>> discovery_timeout
>>> unsafe_tcm_mode
>>> cql_start_time
>>> native_transport_throw_on_overload
>>> native_transport_queue_max_item_age_threshold
>>> native_transport_min_backoff_on_queue_overload
>>> native_transport_max_backoff_on_queue_overload
>>> native_transport_timeout
>>> enforce_native_deadline_for_hints
>>> Total missed:112
>>>
>>>
>>>
>>> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org>
>>> wrote:
>>>
>>>> It should also work the other way around. If there is a property which
>>>> is commented out in yaml and it is not in Config.java, that should fail as
>>>> well. If it is not commented out and it is not in Config.java, that will
>>>> fail in runtime as it fails on unrecognized property.
>>>>
>>>> This will be used in practice very rarely as we seldom remove the
>>>> properties in Config but if we do and a property is commented out, we
>>>> should not ship a dead property name, even commented out.
>>>>
>>>> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote:
>>>>
>>>>> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might
>>>>> indeed add it, also commented out. I think it would be quite easy to check
>>>>> against yaml if there is a line starting on "# my_cool_property" or just 
>>>>> on
>>>>> "my_cool_property". Both cases would satisfy the check.
>>>>>
>>>>> Makes sense, I think this would be good to have as a lint or test to
>>>>> easily catch overlooks during review.
>>>>>
>>>>>
>>>>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <
>>>>> smikloso...@apache.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote:
>>>>>>
>>>>>>> > from time to time I see configuration properties in Config.java
>>>>>>> and they are clearly not in cassandra.yaml. Not every property in 
>>>>>>> Config is
>>>>>>> in cassandra.yaml. I would like to know if there is some specific reason
>>>>>>> behind that.
>>>>>>>
>>>>>>> I think one of the original reasons was to "hide" advanced configs
>>>>>>> that are not meant to be updated, unless in very niche circumstances.
>>>>>>> However I think this has been extrapolated to non-advanced settings.
>>>>>>>
>>>>>>> > Question related to that is if we could not have a build-time
>>>>>>> check that all properties in Config have to be in cassandra.yaml and 
>>>>>>> fail
>>>>>>> the build if a property in Config does not have its counterpart in yaml.
>>>>>>>
>>>>>>> Are you saying every configuration property should be commented-out,
>>>>>>> or do you think that every Config property should be specified in
>>>>>>> cassandra.yaml with their default uncomented ? One issue with that is 
>>>>>>> that
>>>>>>> you could cause user confusion if you "reveal" a niche/advanced config 
>>>>>>> that
>>>>>>> is not meant to be updated. I think this would be addressed by
>>>>>>> the @HiddenInYaml flag you are proposing in a later post.
>>>>>>>
>>>>>>
>>>>>> Yes, then can stay hidden, but we should annotate it with @Hidden or
>>>>>> similar. As of now, if that property is not in yaml, we just don't know 
>>>>>> if
>>>>>> it was forgotten to be added or if we have not added it on purpose.
>>>>>>
>>>>>> They can keep being commented out if they currently are. Imagine a
>>>>>> property in Config.java
>>>>>>
>>>>>> public boolean my_cool_property = true;
>>>>>>
>>>>>> and then this in cassandra.yaml
>>>>>>
>>>>>> # my_cool_property: true
>>>>>>
>>>>>> It is completely ok.
>>>>>>
>>>>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might
>>>>>> indeed add it, also commented out. I think it would be quite easy to 
>>>>>> check
>>>>>> against yaml if there is a line starting on "# my_cool_property" or just 
>>>>>> on
>>>>>> "my_cool_property". Both cases would satisfy the check.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> > There are dozens of properties in Config and I have a strong
>>>>>>> suspicion that we missed to publish some to yaml so users do not even 
>>>>>>> know
>>>>>>> such a property exists and as of now we do not even know which they are.
>>>>>>>
>>>>>>> I believe this is a problem. I think most properties should be in
>>>>>>> cassandra.yaml, unless they are very advanced or not meant to be 
>>>>>>> updated.
>>>>>>>
>>>>>>> Another tangential issue is that there are features/settings that
>>>>>>> don't even have a Config entry, but are just controlled by JVM 
>>>>>>> properties.
>>>>>>>
>>>>>>> I think that we should attempt to unify Config and jvm properties
>>>>>>> under a predictable structure. For example, if there is a YAML config
>>>>>>> enable_user_defined_functions, then there should be a respective JVM 
>>>>>>> flag
>>>>>>> -Dcassandra.enable_user_defined_functions, and vice versa.
>>>>>>>
>>>>>>
>>>>>> Yeah, good idea.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <
>>>>>>> smikloso...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> from time to time I see configuration properties in Config.java and
>>>>>>>> they are clearly not in cassandra.yaml. Not every property in Config 
>>>>>>>> is in
>>>>>>>> cassandra.yaml. I would like to know if there is some specific reason
>>>>>>>> behind that.
>>>>>>>>
>>>>>>>> Question related to that is if we could not have a build-time check
>>>>>>>> that all properties in Config have to be in cassandra.yaml and fail the
>>>>>>>> build if a property in Config does not have its counterpart in yaml.
>>>>>>>>
>>>>>>>> There are dozens of properties in Config and I have a strong
>>>>>>>> suspicion that we missed to publish some to yaml so users do not even 
>>>>>>>> know
>>>>>>>> such a property exists and as of now we do not even know which they 
>>>>>>>> are.
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Dmitry Konstantinov
>>>
>>
>
> --
> Dmitry Konstantinov
>

Reply via email to