[ 
https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779444#comment-17779444
 ] 

Branimir Lambov commented on CASSANDRA-18945:
---------------------------------------------

Attached [the result of a recent 
benchmark|https://issues.apache.org/jira/secure/attachment/13063855/key-value-oss.html]
 comparing the UCS default (green) to STCS (blue) and an option with larger 
SSTable size (orange). The default UCS has worse results in the throughput 
stage, but more importantly it is unable to serve the 110k ops/s during the 1:1 
and read-only stages. I'm still investigating what causes these reads to be so 
slow, but switching to 10GiB target fully fixes the problem (the two other 
options the orange graph uses, 'base_shard_count': '1' and 
'max_sstables_to_compact': '32', help but are not as significant on their own).

Rather than ask users to choose a target size based on their expected data 
density, the database should be able to deal with this itself. Admitting some 
of the growth into the sstable size is a good way to achieve that.

> Unified Compaction Strategy is creating too many sstables
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-18945
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18945
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Assignee: Ethan Brown
>            Priority: Normal
>         Attachments: key-value-oss.html
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The unified compaction strategy currently aims to create sstables with close 
> to the same size, defaulting to 1 GiB. Unfortunately tests show that 
> Cassandra starts to have performance problems when the number of sstables 
> grows to the order of a thousand, and in particular that even 1 TiB of data 
> with the default configuration is creating too many sstables for efficient 
> processing. This matters even more for SAI, where the number of sstables in 
> the system can have a proportional effect on the complexity of operations.
> It is quite easy to create a configuration option that allows sstables to 
> take some part of the data growth by adding a multiplier to [the shard count 
> calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding]
>  formula, replacing 
> {{2 ^ round(log2(d / (t * b))) * b}} 
> with 
> {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, 
> where 𝜆 is a parameter whose value is between 0 and 1.
> With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in 
> parallel at the square root of the data size growth. 0 would result in no 
> growth, and 1 in always using the same number of shards.
> It may also be valuable to introduce a threshold for engaging the base shard 
> count to avoid splitting lowest-level sstables into fragments that are too 
> small.
> Once both of these are in place, we can set defaults that better suit all 
> node densities, including 10 TiB and beyond, for example:
>  - target size of 1 GiB
>  - 𝜆 of 1/3
>  - base shard count of 4
>  - minimum size 100 MiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to