A very primitive implementation of the 1st idea below: String configUrl = "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml"; Field[] allFields = Config.class.getFields(); List<String> topLevelPropertyNames = new ArrayList<>(); for(Field field : allFields) { if (!Modifier.isStatic(field.getModifiers())) { topLevelPropertyNames.add(field.getName()); } }
URL url = new URL(configUrl); List<String> lines = Files.readAllLines(Paths.get(url.toURI())); int missedCount = 0; for (String propertyName : topLevelPropertyNames) { boolean found = false; for (String line : lines) { if (line.startsWith(propertyName + ":") || line.startsWith("#" + propertyName + ":") || line.startsWith("# " + propertyName + ":")) { found = true; break; } } if (!found) { missedCount++; System.out.println(propertyName); } } System.out.println("Total missed:" + missedCount); It prints the following config property names which are defined in Config.java but not present as "property" or "# property " in a file: permissions_cache_max_entries roles_cache_max_entries credentials_cache_max_entries auto_bootstrap force_new_prepared_statement_behaviour use_deterministic_table_id repair_request_timeout stream_transfer_task_timeout cms_await_timeout cms_default_max_retries cms_default_retry_backoff epoch_aware_debounce_inflight_tracker_max_size metadata_snapshot_frequency available_processors repair_session_max_tree_depth use_offheap_merkle_trees internode_max_message_size native_transport_max_message_size native_transport_max_request_data_in_flight_per_ip native_transport_max_request_data_in_flight native_transport_receive_queue_capacity min_free_space_per_drive max_space_usable_for_compactions_in_percentage reject_repair_compaction_threshold concurrent_index_builders max_streaming_retries commitlog_max_compression_buffers_in_pool max_mutation_size dynamic_snitch failure_detector use_creation_time_for_hint_ttl key_cache_migrate_during_compaction key_cache_invalidate_after_sstable_deletion paxos_cache_size file_cache_round_up disk_optimization_estimate_percentile disk_optimization_page_cross_chance purgeable_tobmstones_metric_granularity windows_timer_interval otc_coalescing_strategy otc_coalescing_window_us otc_coalescing_enough_coalesced_messages otc_backlog_expiration_interval_ms scripted_user_defined_functions_enabled user_defined_functions_threads_enabled allow_insecure_udfs allow_extra_insecure_udfs user_defined_functions_warn_timeout user_defined_functions_fail_timeout user_function_timeout_policy back_pressure_enabled back_pressure_strategy repair_command_pool_full_strategy repair_command_pool_size block_for_peers_timeout_in_secs block_for_peers_in_remote_dcs skip_stream_disk_space_check snapshot_on_repaired_data_mismatch validation_preview_purge_head_start initial_range_tombstone_list_allocation_size range_tombstone_list_growth_factor snapshot_on_duplicate_row_detection check_for_duplicate_rows_during_reads check_for_duplicate_rows_during_compaction autocompaction_on_startup_enabled auto_optimise_inc_repair_streams auto_optimise_full_repair_streams auto_optimise_preview_repair_streams consecutive_message_errors_threshold internode_error_reporting_exclusions compact_tables_enabled vector_type_enabled intersect_filtering_query_warned intersect_filtering_query_enabled streaming_slow_events_log_timeout repair_state_expires repair_state_size paxos_variant skip_paxos_repair_on_topology_change paxos_purge_grace_period paxos_on_linearizability_violations paxos_state_purging paxos_repair_enabled paxos_topology_repair_no_dc_checks paxos_topology_repair_strict_each_quorum skip_paxos_repair_on_topology_change_keyspaces paxos_contention_wait_randomizer paxos_contention_min_wait paxos_contention_max_wait paxos_contention_min_delta paxos_repair_parallelism sstable_read_rate_persistence_enabled client_request_size_metrics_enabled max_top_size_partition_count max_top_tombstone_partition_count min_tracked_partition_size min_tracked_partition_tombstone_count top_partitions_enabled severity_during_decommission progress_barrier_min_consistency_level progress_barrier_default_consistency_level progress_barrier_timeout progress_barrier_backoff discovery_timeout unsafe_tcm_mode cql_start_time native_transport_throw_on_overload native_transport_queue_max_item_age_threshold native_transport_min_backoff_on_queue_overload native_transport_max_backoff_on_queue_overload native_transport_timeout enforce_native_deadline_for_hints Total missed:112 On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org> wrote: > It should also work the other way around. If there is a property which is > commented out in yaml and it is not in Config.java, that should fail as > well. If it is not commented out and it is not in Config.java, that will > fail in runtime as it fails on unrecognized property. > > This will be used in practice very rarely as we seldom remove the > properties in Config but if we do and a property is commented out, we > should not ship a dead property name, even commented out. > > On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote: > >> > > If "# my_cool_property: true" is NOT in cassandra.yaml, we might >> indeed add it, also commented out. I think it would be quite easy to check >> against yaml if there is a line starting on "# my_cool_property" or just on >> "my_cool_property". Both cases would satisfy the check. >> >> Makes sense, I think this would be good to have as a lint or test to >> easily catch overlooks during review. >> >> >> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> >>> >>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote: >>> >>>> > from time to time I see configuration properties in Config.java and >>>> they are clearly not in cassandra.yaml. Not every property in Config is in >>>> cassandra.yaml. I would like to know if there is some specific reason >>>> behind that. >>>> >>>> I think one of the original reasons was to "hide" advanced configs that >>>> are not meant to be updated, unless in very niche circumstances. However I >>>> think this has been extrapolated to non-advanced settings. >>>> >>>> > Question related to that is if we could not have a build-time check >>>> that all properties in Config have to be in cassandra.yaml and fail the >>>> build if a property in Config does not have its counterpart in yaml. >>>> >>>> Are you saying every configuration property should be commented-out, or >>>> do you think that every Config property should be specified in >>>> cassandra.yaml with their default uncomented ? One issue with that is that >>>> you could cause user confusion if you "reveal" a niche/advanced config that >>>> is not meant to be updated. I think this would be addressed by >>>> the @HiddenInYaml flag you are proposing in a later post. >>>> >>> >>> Yes, then can stay hidden, but we should annotate it with @Hidden or >>> similar. As of now, if that property is not in yaml, we just don't know if >>> it was forgotten to be added or if we have not added it on purpose. >>> >>> They can keep being commented out if they currently are. Imagine a >>> property in Config.java >>> >>> public boolean my_cool_property = true; >>> >>> and then this in cassandra.yaml >>> >>> # my_cool_property: true >>> >>> It is completely ok. >>> >>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed >>> add it, also commented out. I think it would be quite easy to check against >>> yaml if there is a line starting on "# my_cool_property" or just on >>> "my_cool_property". Both cases would satisfy the check. >>> >>> >>> >>>> > There are dozens of properties in Config and I have a strong >>>> suspicion that we missed to publish some to yaml so users do not even know >>>> such a property exists and as of now we do not even know which they are. >>>> >>>> I believe this is a problem. I think most properties should be in >>>> cassandra.yaml, unless they are very advanced or not meant to be updated. >>>> >>>> Another tangential issue is that there are features/settings that don't >>>> even have a Config entry, but are just controlled by JVM properties. >>>> >>>> I think that we should attempt to unify Config and jvm properties under >>>> a predictable structure. For example, if there is a YAML config >>>> enable_user_defined_functions, then there should be a respective JVM flag >>>> -Dcassandra.enable_user_defined_functions, and vice versa. >>>> >>> >>> Yeah, good idea. >>> >>> >>>> >>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič < >>>> smikloso...@apache.org> wrote: >>>> >>>>> Hello, >>>>> >>>>> from time to time I see configuration properties in Config.java and >>>>> they are clearly not in cassandra.yaml. Not every property in Config is in >>>>> cassandra.yaml. I would like to know if there is some specific reason >>>>> behind that. >>>>> >>>>> Question related to that is if we could not have a build-time check >>>>> that all properties in Config have to be in cassandra.yaml and fail the >>>>> build if a property in Config does not have its counterpart in yaml. >>>>> >>>>> There are dozens of properties in Config and I have a strong suspicion >>>>> that we missed to publish some to yaml so users do not even know such a >>>>> property exists and as of now we do not even know which they are. >>>>> >>>> -- Dmitry Konstantinov