[jira] [Comment Edited] (CASSANDRA-18945) Unified Compaction Strategy is creating too many sstables

Stefan Miklosovic (Jira) Mon, 06 Nov 2023 23:48:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783517#comment-17783517
 ]


Stefan Miklosovic edited comment on CASSANDRA-18945 at 11/7/23 7:47 AM:
------------------------------------------------------------------------

So ... this is interesting. It fails the multiplexer of 
j17_jvm_dtests_vnode_repeat as well as the individual test in 
j17_jvm_dtests_vnode

What is interesting is that it does not fail j17_jvm_dtests_repeat (without 
vnode).

java17_separate_tests which runs  j17_jvm_dtests_vnode does not fail it either. 
I am trying to run j17_jvm_dtests_vnode_repeat for java17_separate_tests if it 
indeed fails there too.

This is the PR against trunk.

This is the branch (2)

(1) 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/3443/workflows/97bfb70b-146a-4da8-afaf-7f5909b0492d
(2) https://github.com/instaclustr/cassandra/commits/CASSANDRA-18945-trunk


was (Author: smiklosovic):
So ... this is interesting. It fails the multiplexer of 
j17_jvm_dtests_vnode_repeat as well as the individual test in 
j17_jvm_dtests_vnode

What is interesting is that it does not fail j17_jvm_dtests_repeat (without 
vnode).

java17_separate_tests which runs  j17_jvm_dtests_vnode does not fail it either. 
I am trying to run j17_jvm_dtests_vnode_repeat for java17_separate_tests if it 
indeed fails there too.

This is the PR against trunk.

(1) 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/3443/workflows/97bfb70b-146a-4da8-afaf-7f5909b0492d

> Unified Compaction Strategy is creating too many sstables
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-18945
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18945
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Assignee: Ethan Brown
>            Priority: Normal
>             Fix For: 5.0-beta
>
>         Attachments: file_ucs_shenandoah.html, file_ucs_shenandoah_3.html, 
> file_ucs_shenandoah_off_heap_memtable.html, 
> file_ucs_shenandoah_on_heap_memtable_2.html, 
> file_ucs_shenandoah_on_heap_memtable_3.html, key-value-oss.html
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The unified compaction strategy currently aims to create sstables with close 
> to the same size, defaulting to 1 GiB. Unfortunately tests show that 
> Cassandra starts to have performance problems when the number of sstables 
> grows to the order of a thousand, and in particular that even 1 TiB of data 
> with the default configuration is creating too many sstables for efficient 
> processing. This matters even more for SAI, where the number of sstables in 
> the system can have a proportional effect on the complexity of operations.
> It is quite easy to create a configuration option that allows sstables to 
> take some part of the data growth by adding a multiplier to [the shard count 
> calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding]
>  formula, replacing 
> {{2 ^ round(log2(d / (t * b))) * b}} 
> with 
> {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, 
> where 𝜆 is a parameter whose value is between 0 and 1.
> With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in 
> parallel at the square root of the data size growth. 0 would result in no 
> growth, and 1 in always using the same number of shards.
> It may also be valuable to introduce a threshold for engaging the base shard 
> count to avoid splitting lowest-level sstables into fragments that are too 
> small.
> Once both of these are in place, we can set defaults that better suit all 
> node densities, including 10 TiB and beyond, for example:
>  - target size of 1 GiB
>  - 𝜆 of 1/3
>  - base shard count of 4
>  - minimum size 100 MiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-18945) Unified Compaction Strategy is creating too many sstables

Reply via email to