A very primitive implementation of the 1st idea below:

String configUrl =
"file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
Field[] allFields = Config.class.getFields();
List<String> topLevelPropertyNames = new ArrayList<>();
for(Field field : allFields)
{
    if (!Modifier.isStatic(field.getModifiers()))
    {
        topLevelPropertyNames.add(field.getName());
    }
}

URL url = new URL(configUrl);
List<String> lines = Files.readAllLines(Paths.get(url.toURI()));

int missedCount = 0;
for (String propertyName : topLevelPropertyNames)
{
    boolean found = false;
    for (String line : lines)
    {
        if (line.startsWith(propertyName + ":")
            || line.startsWith("#" + propertyName + ":")
            || line.startsWith("# " + propertyName + ":")) {
            found = true;
            break;
        }
    }
    if (!found)
    {
        missedCount++;
        System.out.println(propertyName);
    }
}
System.out.println("Total missed:" + missedCount);


It prints the following config property names which are defined in
Config.java but not present as "property" or "# property " in a file:

permissions_cache_max_entries
roles_cache_max_entries
credentials_cache_max_entries
auto_bootstrap
force_new_prepared_statement_behaviour
use_deterministic_table_id
repair_request_timeout
stream_transfer_task_timeout
cms_await_timeout
cms_default_max_retries
cms_default_retry_backoff
epoch_aware_debounce_inflight_tracker_max_size
metadata_snapshot_frequency
available_processors
repair_session_max_tree_depth
use_offheap_merkle_trees
internode_max_message_size
native_transport_max_message_size
native_transport_max_request_data_in_flight_per_ip
native_transport_max_request_data_in_flight
native_transport_receive_queue_capacity
min_free_space_per_drive
max_space_usable_for_compactions_in_percentage
reject_repair_compaction_threshold
concurrent_index_builders
max_streaming_retries
commitlog_max_compression_buffers_in_pool
max_mutation_size
dynamic_snitch
failure_detector
use_creation_time_for_hint_ttl
key_cache_migrate_during_compaction
key_cache_invalidate_after_sstable_deletion
paxos_cache_size
file_cache_round_up
disk_optimization_estimate_percentile
disk_optimization_page_cross_chance
purgeable_tobmstones_metric_granularity
windows_timer_interval
otc_coalescing_strategy
otc_coalescing_window_us
otc_coalescing_enough_coalesced_messages
otc_backlog_expiration_interval_ms
scripted_user_defined_functions_enabled
user_defined_functions_threads_enabled
allow_insecure_udfs
allow_extra_insecure_udfs
user_defined_functions_warn_timeout
user_defined_functions_fail_timeout
user_function_timeout_policy
back_pressure_enabled
back_pressure_strategy
repair_command_pool_full_strategy
repair_command_pool_size
block_for_peers_timeout_in_secs
block_for_peers_in_remote_dcs
skip_stream_disk_space_check
snapshot_on_repaired_data_mismatch
validation_preview_purge_head_start
initial_range_tombstone_list_allocation_size
range_tombstone_list_growth_factor
snapshot_on_duplicate_row_detection
check_for_duplicate_rows_during_reads
check_for_duplicate_rows_during_compaction
autocompaction_on_startup_enabled
auto_optimise_inc_repair_streams
auto_optimise_full_repair_streams
auto_optimise_preview_repair_streams
consecutive_message_errors_threshold
internode_error_reporting_exclusions
compact_tables_enabled
vector_type_enabled
intersect_filtering_query_warned
intersect_filtering_query_enabled
streaming_slow_events_log_timeout
repair_state_expires
repair_state_size
paxos_variant
skip_paxos_repair_on_topology_change
paxos_purge_grace_period
paxos_on_linearizability_violations
paxos_state_purging
paxos_repair_enabled
paxos_topology_repair_no_dc_checks
paxos_topology_repair_strict_each_quorum
skip_paxos_repair_on_topology_change_keyspaces
paxos_contention_wait_randomizer
paxos_contention_min_wait
paxos_contention_max_wait
paxos_contention_min_delta
paxos_repair_parallelism
sstable_read_rate_persistence_enabled
client_request_size_metrics_enabled
max_top_size_partition_count
max_top_tombstone_partition_count
min_tracked_partition_size
min_tracked_partition_tombstone_count
top_partitions_enabled
severity_during_decommission
progress_barrier_min_consistency_level
progress_barrier_default_consistency_level
progress_barrier_timeout
progress_barrier_backoff
discovery_timeout
unsafe_tcm_mode
cql_start_time
native_transport_throw_on_overload
native_transport_queue_max_item_age_threshold
native_transport_min_backoff_on_queue_overload
native_transport_max_backoff_on_queue_overload
native_transport_timeout
enforce_native_deadline_for_hints
Total missed:112



On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <smikloso...@apache.org>
wrote:

> It should also work the other way around. If there is a property which is
> commented out in yaml and it is not in Config.java, that should fail as
> well. If it is not commented out and it is not in Config.java, that will
> fail in runtime as it fails on unrecognized property.
>
> This will be used in practice very rarely as we seldom remove the
> properties in Config but if we do and a property is commented out, we
> should not ship a dead property name, even commented out.
>
> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <pa...@apache.org> wrote:
>
>> >  >  If "# my_cool_property: true" is NOT in cassandra.yaml, we might
>> indeed add it, also commented out. I think it would be quite easy to check
>> against yaml if there is a line starting on "# my_cool_property" or just on
>> "my_cool_property". Both cases would satisfy the check.
>>
>> Makes sense, I think this would be good to have as a lint or test to
>> easily catch overlooks during review.
>>
>>
>> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <pa...@apache.org> wrote:
>>>
>>>> > from time to time I see configuration properties in Config.java and
>>>> they are clearly not in cassandra.yaml. Not every property in Config is in
>>>> cassandra.yaml. I would like to know if there is some specific reason
>>>> behind that.
>>>>
>>>> I think one of the original reasons was to "hide" advanced configs that
>>>> are not meant to be updated, unless in very niche circumstances. However I
>>>> think this has been extrapolated to non-advanced settings.
>>>>
>>>> > Question related to that is if we could not have a build-time check
>>>> that all properties in Config have to be in cassandra.yaml and fail the
>>>> build if a property in Config does not have its counterpart in yaml.
>>>>
>>>> Are you saying every configuration property should be commented-out, or
>>>> do you think that every Config property should be specified in
>>>> cassandra.yaml with their default uncomented ? One issue with that is that
>>>> you could cause user confusion if you "reveal" a niche/advanced config that
>>>> is not meant to be updated. I think this would be addressed by
>>>> the @HiddenInYaml flag you are proposing in a later post.
>>>>
>>>
>>> Yes, then can stay hidden, but we should annotate it with @Hidden or
>>> similar. As of now, if that property is not in yaml, we just don't know if
>>> it was forgotten to be added or if we have not added it on purpose.
>>>
>>> They can keep being commented out if they currently are. Imagine a
>>> property in Config.java
>>>
>>> public boolean my_cool_property = true;
>>>
>>> and then this in cassandra.yaml
>>>
>>> # my_cool_property: true
>>>
>>> It is completely ok.
>>>
>>> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed
>>> add it, also commented out. I think it would be quite easy to check against
>>> yaml if there is a line starting on "# my_cool_property" or just on
>>> "my_cool_property". Both cases would satisfy the check.
>>>
>>>
>>>
>>>> > There are dozens of properties in Config and I have a strong
>>>> suspicion that we missed to publish some to yaml so users do not even know
>>>> such a property exists and as of now we do not even know which they are.
>>>>
>>>> I believe this is a problem. I think most properties should be in
>>>> cassandra.yaml, unless they are very advanced or not meant to be updated.
>>>>
>>>> Another tangential issue is that there are features/settings that don't
>>>> even have a Config entry, but are just controlled by JVM properties.
>>>>
>>>> I think that we should attempt to unify Config and jvm properties under
>>>> a predictable structure. For example, if there is a YAML config
>>>> enable_user_defined_functions, then there should be a respective JVM flag
>>>> -Dcassandra.enable_user_defined_functions, and vice versa.
>>>>
>>>
>>> Yeah, good idea.
>>>
>>>
>>>>
>>>> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <
>>>> smikloso...@apache.org> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> from time to time I see configuration properties in Config.java and
>>>>> they are clearly not in cassandra.yaml. Not every property in Config is in
>>>>> cassandra.yaml. I would like to know if there is some specific reason
>>>>> behind that.
>>>>>
>>>>> Question related to that is if we could not have a build-time check
>>>>> that all properties in Config have to be in cassandra.yaml and fail the
>>>>> build if a property in Config does not have its counterpart in yaml.
>>>>>
>>>>> There are dozens of properties in Config and I have a strong suspicion
>>>>> that we missed to publish some to yaml so users do not even know such a
>>>>> property exists and as of now we do not even know which they are.
>>>>>
>>>>

-- 
Dmitry Konstantinov

Reply via email to