Might be a bit of a balance between exposing what people actually are likely to need to modify vs having a super intimidating config file. It's already nearly 2000 lines. Personally I'd rather see some auto-documentation or something that's in the docs <https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html> than an effort to manually add another 1000 lines.
Chris On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <netud...@gmail.com> wrote: > Maybe I missed some patterns but it looks like a pretty good estimation, I > did like 10 random checks manually to verify :-) > I will try to make an ant target with a similar logic (hopefully, during > the weekend) > I will create a ticket to track this activity (to share attachments there > to not overload the thread with such outputs in future). > > On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> Oh my god, 112? :DD I was thinking it would be less than 10. >> >> Anyway, I think we need to integrate this to some ant target. If you >> expanded on this, that would be great. >> >> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <netud...@gmail.com> >> wrote: >> >>> A very primitive implementation of the 1st idea below: >>> >>> String configUrl = >>> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml"; >>> Field[] allFields = Config.class.getFields(); >>> List<String> topLevelPropertyNames = new ArrayList<>(); >>> for(Field field : allFields) >>> { >>> if (!Modifier.isStatic(field.getModifiers())) >>> { >>> topLevelPropertyNames.add(field.getName()); >>> } >>> } >>> >>> URL url = new URL(configUrl); >>> List<String> lines = Files.readAllLines(Paths.get(url.toURI())); >>> >>> int missedCount = 0; >>> for (String propertyName : topLevelPropertyNames) >>> { >>> boolean found = false; >>> for (String line : lines) >>> { >>> if (line.startsWith(propertyName + ":") >>> || line.startsWith("#" + propertyName + ":") >>> || line.startsWith("# " + propertyName + ":")) { >>> found = true; >>> break; >>> } >>> } >>> if (!found) >>> { >>> missedCount++; >>> System.out.println(propertyName); >>> } >>> } >>> System.out.println("Total missed:" + missedCount); >>> >>> >>> It prints the following config property names which are defined in >>> Config.java but not present as "property" or "# property " in a file: >>> >>> permissions_cache_max_entries >>> roles_cache_max_entries >>> credentials_cache_max_entries >>> auto_bootstrap >>> force_new_prepared_statement_behaviour >>> use_deterministic_table_id >>> repair_request_timeout >>> stream_transfer_task_timeout >>> cms_await_timeout >>> cms_default_max_retries >>> cms_default_retry_backoff >>> epoch_aware_debounce_inflight_tracker_max_size >>> metadata_snapshot_frequency >>> available_processors >>> repair_session_max_tree_depth >>> use_offheap_merkle_trees >>> internode_max_message_size >>> native_transport_max_message_size >>> native_transport_max_request_data_in_flight_per_ip >>> native_transport_max_request_data_in_flight >>> native_transport_receive_queue_capacity >>> min_free_space_per_drive >>> max_space_usable_for_compactions_in_percentage >>> reject_repair_compaction_threshold >>> concurrent_index_builders >>> max_streaming_retries >>> commitlog_max_compression_buffers_in_pool >>> max_mutation_size >>> dynamic_snitch >>> failure_detector >>> use_creation_time_for_hint_ttl >>> key_cache_migrate_during_compaction >>> key_cache_invalidate_after_sstable_deletion >>> paxos_cache_size >>> file_cache_round_up >>> disk_optimization_estimate_percentile >>> disk_optimization_page_cross_chance >>> purgeable_tobmstones_metric_granularity >>> windows_timer_interval >>> otc_coalescing_strategy >>> otc_coalescing_window_us >>> otc_coalescing_enough_coalesced_messages >>> otc_backlog_expiration_interval_ms >>> scripted_user_defined_functions_enabled >>> user_defined_functions_threads_enabled >>> allow_insecure_udfs >>> allow_extra_insecure_udfs >>> user_defined_functions_warn_timeout >>> user_defined_functions_fail_timeout >>> user_function_timeout_policy >>> back_pressure_enabled >>> back_pressure_strategy >>> repair_command_pool_full_strategy >>> repair_command_pool_size >>> block_for_peers_timeout_in_secs >>> block_for_peers_in_remote_dcs >>> skip_stream_disk_space_check >>> snapshot_on_repaired_data_mismatch >>> validation_preview_purge_head_start >>> initial_range_tombstone_list_allocation_size >>> range_tombstone_list_growth_factor >>> snapshot_on_duplicate_row_detection >>> check_for_duplicate_rows_during_reads >>> check_for_duplicate_rows_during_compaction >>> autocompaction_on_startup_enabled >>> auto_optimise_inc_repair_streams >>> auto_optimise_full_repair_streams >>> auto_optimise_preview_repair_streams >>> consecutive_message_errors_threshold >>> internode_error_reporting_exclusions >>> compact_tables_enabled >>> vector_type_enabled >>> intersect_filtering_query_warned >>> intersect_filtering_query_enabled >>> streaming_slow_events_log_timeout >>> repair_state_expires >>> repair_state_size >>> paxos_variant >>> skip_paxos_repair_on_topology_change >>> paxos_purge_grace_period >>> paxos_on_linearizability_violations >>> paxos_state_purging >>> paxos_repair_enabled >>> paxos_topology_repair_no_dc_checks >>> paxos_topology_repair_strict_each_quorum >>> skip_paxos_repair_on_topology_change_keyspaces >>> paxos_contention_wait_randomizer >>> paxos_contention_min_wait >>> paxos_contention_max_wait >>> paxos_contention_min_delta >>> paxos_repair_parallelism >>> sstable_read_rate_persistence_enabled >>> client_request_size_metrics_enabled >>> max_top_size_partition_count >>> max_top_tombstone_partition_count >>> min_tracked_partition_size >>> min_tracked_partition_tombstone_count >>> top_partitions_enabled >>> severity_during_decommission >>> progress_barrier_min_consistency_level >>> progress_barrier_default_consistency_level >>> progress_barrier_timeout >>> progress_barrier_backoff >>> discovery_timeout >>> unsafe_tcm_mode >>> cql_start_time >>> native_transport_throw_on_overload >>> native_transport_queue_max_item_age_threshold >>> native_transport_min_backoff_on_queue_overload >>> native_transport_max_backoff_on_queue_overload >>> native_transport_timeout >>> enforce_native_deadline_for_hints >>> Total missed:112 >>> >>> >>> >>> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org> >>> wrote: >>> >>>> It should also work the other way around. If there is a property which >>>> is commented out in yaml and it is not in Config.java, that should fail as >>>> well. If it is not commented out and it is not in Config.java, that will >>>> fail in runtime as it fails on unrecognized property. >>>> >>>> This will be used in practice very rarely as we seldom remove the >>>> properties in Config but if we do and a property is commented out, we >>>> should not ship a dead property name, even commented out. >>>> >>>> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote: >>>> >>>>> > > If "# my_cool_property: true" is NOT in cassandra.yaml, we might >>>>> indeed add it, also commented out. I think it would be quite easy to check >>>>> against yaml if there is a line starting on "# my_cool_property" or just >>>>> on >>>>> "my_cool_property". Both cases would satisfy the check. >>>>> >>>>> Makes sense, I think this would be good to have as a lint or test to >>>>> easily catch overlooks during review. >>>>> >>>>> >>>>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič < >>>>> smikloso...@apache.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote: >>>>>> >>>>>>> > from time to time I see configuration properties in Config.java >>>>>>> and they are clearly not in cassandra.yaml. Not every property in >>>>>>> Config is >>>>>>> in cassandra.yaml. I would like to know if there is some specific reason >>>>>>> behind that. >>>>>>> >>>>>>> I think one of the original reasons was to "hide" advanced configs >>>>>>> that are not meant to be updated, unless in very niche circumstances. >>>>>>> However I think this has been extrapolated to non-advanced settings. >>>>>>> >>>>>>> > Question related to that is if we could not have a build-time >>>>>>> check that all properties in Config have to be in cassandra.yaml and >>>>>>> fail >>>>>>> the build if a property in Config does not have its counterpart in yaml. >>>>>>> >>>>>>> Are you saying every configuration property should be commented-out, >>>>>>> or do you think that every Config property should be specified in >>>>>>> cassandra.yaml with their default uncomented ? One issue with that is >>>>>>> that >>>>>>> you could cause user confusion if you "reveal" a niche/advanced config >>>>>>> that >>>>>>> is not meant to be updated. I think this would be addressed by >>>>>>> the @HiddenInYaml flag you are proposing in a later post. >>>>>>> >>>>>> >>>>>> Yes, then can stay hidden, but we should annotate it with @Hidden or >>>>>> similar. As of now, if that property is not in yaml, we just don't know >>>>>> if >>>>>> it was forgotten to be added or if we have not added it on purpose. >>>>>> >>>>>> They can keep being commented out if they currently are. Imagine a >>>>>> property in Config.java >>>>>> >>>>>> public boolean my_cool_property = true; >>>>>> >>>>>> and then this in cassandra.yaml >>>>>> >>>>>> # my_cool_property: true >>>>>> >>>>>> It is completely ok. >>>>>> >>>>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might >>>>>> indeed add it, also commented out. I think it would be quite easy to >>>>>> check >>>>>> against yaml if there is a line starting on "# my_cool_property" or just >>>>>> on >>>>>> "my_cool_property". Both cases would satisfy the check. >>>>>> >>>>>> >>>>>> >>>>>>> > There are dozens of properties in Config and I have a strong >>>>>>> suspicion that we missed to publish some to yaml so users do not even >>>>>>> know >>>>>>> such a property exists and as of now we do not even know which they are. >>>>>>> >>>>>>> I believe this is a problem. I think most properties should be in >>>>>>> cassandra.yaml, unless they are very advanced or not meant to be >>>>>>> updated. >>>>>>> >>>>>>> Another tangential issue is that there are features/settings that >>>>>>> don't even have a Config entry, but are just controlled by JVM >>>>>>> properties. >>>>>>> >>>>>>> I think that we should attempt to unify Config and jvm properties >>>>>>> under a predictable structure. For example, if there is a YAML config >>>>>>> enable_user_defined_functions, then there should be a respective JVM >>>>>>> flag >>>>>>> -Dcassandra.enable_user_defined_functions, and vice versa. >>>>>>> >>>>>> >>>>>> Yeah, good idea. >>>>>> >>>>>> >>>>>>> >>>>>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič < >>>>>>> smikloso...@apache.org> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> from time to time I see configuration properties in Config.java and >>>>>>>> they are clearly not in cassandra.yaml. Not every property in Config >>>>>>>> is in >>>>>>>> cassandra.yaml. I would like to know if there is some specific reason >>>>>>>> behind that. >>>>>>>> >>>>>>>> Question related to that is if we could not have a build-time check >>>>>>>> that all properties in Config have to be in cassandra.yaml and fail the >>>>>>>> build if a property in Config does not have its counterpart in yaml. >>>>>>>> >>>>>>>> There are dozens of properties in Config and I have a strong >>>>>>>> suspicion that we missed to publish some to yaml so users do not even >>>>>>>> know >>>>>>>> such a property exists and as of now we do not even know which they >>>>>>>> are. >>>>>>>> >>>>>>> >>> >>> -- >>> Dmitry Konstantinov >>> >> > > -- > Dmitry Konstantinov >