[jira] [Commented] (CASSANDRA-17212) Migrate threshold for minimum keyspace replication factor to guardrails

David Capwell (Jira) Fri, 21 Jan 2022 09:21:32 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480198#comment-17480198
 ]


David Capwell commented on CASSANDRA-17212:
-------------------------------------------

Maybe we can better find an agreement if we show specific features and debate 
how they should work; we can move on easier once we do that.

Currently we have (made up my own config values, these are not the defaults):

{code}
track_warnings: # planned to be merged into guardrails
    enabled: true
    coordinator_read_size:
        warn_threshold: 1g
        abort_threshold: 2g
    local_read_size:
        warn_threshold: 1g
        abort_threshold: 2g
    row_index_size:
        warn_threshold: 1g
        abort_threshold: 2g
replica_filtering_protection:
    cached_rows_warn_threshold: 2000
    cached_rows_fail_threshold: 32000
guardrails:
  enabled: false
    keyspaces:
        warn_threshold: -1
        abort_threshold: -1
    tables:
        warn_threshold: -1
        abort_threshold: -1
    columns_per_table:
        warn_threshold: -1
        abort_threshold: -1
    secondary_indexes_per_table:
        warn_threshold: -1
        abort_threshold: -1
    materialized_views_per_table:
        warn_threshold: -1
        abort_threshold: -1
    table_properties:
        ignored: []
        disallowed: []
    user_timestamps_enabled: true
    page_size:
      warn_threshold: -1
      abort_threshold: -1
    read_before_write_list_operations_enabled: true
{code}

[~benedict] looks to be proposing

{code}
limits:
  concurrency:
    reads: 32
    writes: 32
    counter_writes: 32
    materialized_view_writes: 32
    clients: 128
    hint_delivery: 2
    flush: 2
    compaction: 1
    repair: 0
    auto_sstable_upgrades: 1
  throughput:
    streaming:
      local: 25MiB/s
      remote: 25MiB/s
    batchlog: 1MiB/s
    compaction: 16MiB/s
    hint_delivery: 1MiB/s
  capacity:
    memtable:
      heap: 2048mb
      offheap: 2048mb
    caching:
      compressed_chunks: 512MiB
      key_index:
        row_index: 2KiB
        partitions: 0MiB
    network:
      tcp:
        send_buffer: 512MiB
        recv_buffer: 512MiB
      connection:
        send_queue: 4MiB
        recv_queue: 4MiB
      endpoint:
        send_queue: 128MiB
        recv_queue: 128MiB
      global:
         send_queue: 512MiB
        recv_queue: 512MiB
  info:
    gc_pause: 200ms
  warn:
    gc_pause: 1000ms
    large_partition: 100mb
    tombstones: 1000
    batch_size: 5kb
    partitions_in_unlogged_batch: 10
  fail:
    tombstones: 100000
    batch_size: 50kb
    corrupt_value_size: 256mb
{code}

The open debate that I am seeing is on grouping within the "limits" section 
(and not seeing anyone disagreeing about a limits section); the debate that I 
see is: do we group under info/warn/fail or do we group at the feature which 
also offers info/warn/fail?

aka (all under limits section; doing on browser so spaces maybe off)

[~benedict]

{code}
  info:
    gc_pause: 200ms
  warn:
    gc_pause: 1000ms
    large_partition: 100mb
    tombstones: 1000
    batch_size: 5kb
    partitions_in_unlogged_batch: 10
    replica_filtering_protection_cached_rows_threshold: 2000
    coordinator_read_size: 1g
    local_read_size: 2g
    row_index_size: 1g
  fail:
    tombstones: 100000
    batch_size: 50kb
    corrupt_value_size: 256mb
    replica_filtering_protection_cached_rows_threshold: 32000
    coordinator_read_size: 2g
    local_read_size: 2g
    row_index_size: 2g
{code}

vs

{code}
  replica_filtering_protection:
    cached_rows:
      warn: 2000
      fail: 32000
  gc_pause:
    info: 200ms
    warn: 1s
  large_partition:
    warn: 100mb
    fail: 1gb (being added in CASSANDRA-17258)
  batch_size:
    warn: 5kb
    fail: 50kb
  coordinator_read_size:
      warn: 1g
      fail: 2g
  local_read_size:
      warn: 1g
      fail: 2g
  row_index_size:
      warn: 1g
      fail: 2g
{code}

Personally I am in favor of the second option; I find grouping based off the 
feature/domain is the best for the following reasons
1) everything for a single topic is together
2) if people don't like nested structures, the names make more sense 
(CASSANDRA-17166 will add support for dot-notation names for people who don't 
like nested: limits.large_partition.warn: 100mb vs limits.warn.large_partition: 
100mb); this I admit is a personal preference
3) easier to maintain in code, as naming isn't copy/pasted against multiple 
objects, but rather we can use shared data types (in track_warnings I had 
inconsistencies constantly until I moved to shared data types, they really help 
with maintaining standard names)

If we can agree on a final structure, then the guardrail patches can move the 
new stuff in that direction, and others are free to migrate existing configs to 
the new structure.

> Migrate threshold for minimum keyspace replication factor to guardrails
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-17212
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17212
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Feature/Guardrails
>            Reporter: Andres de la Peña
>            Priority: Normal
>
> The config property 
> [{{minimum_keyspace_rf}}|https://github.com/apache/cassandra/blob/5fdadb25f95099b8945d9d9ee11d3e380d3867f4/conf/cassandra.yaml]
>  that was added by CASSANDRA-14557 can be migrated to guardrails, for example:
> {code}
> guardrails:
>     ...
>     replication_factor:
>         warn_threshold: 2
>         abort_threshold: 3
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17212) Migrate threshold for minimum keyspace replication factor to guardrails

Reply via email to