Branimir Lambov created CASSANDRA-18945:
-------------------------------------------

             Summary: Unified Compaction Strategy is creating too many sstables
                 Key: CASSANDRA-18945
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18945
             Project: Cassandra
          Issue Type: Bug
          Components: Local/Compaction
            Reporter: Branimir Lambov


The unified compaction strategy currently aims to create sstables with close to 
the same size, defaulting to 1 GiB. Unfortunately tests show that Cassandra 
starts to have performance problems when the number of sstables grows to the 
order of a thousand, and in particular that even 1 TiB of data with the default 
configuration is creating too many sstables for efficient processing. This 
matters even more for SAI, where the number of sstables in the system can have 
a proportional effect on the complexity of operations.

It is quite easy to create a configuration option that allows sstables to take 
some part of the data growth by adding a multiplier to [the shard count 
calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding]
 formula, replacing 
{{2 ^ round(log2(d / (t * b))) * b}} 
with 
{{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, 
where 𝜆 is a parameter whose value is between 0 and 1.

With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in 
parallel at the square root of the data size growth. 0 would result in no 
growth, and 1 in always using the same number of shards.

It may also be valuable to introduce a threshold for engaging the base shard 
count to avoid splitting lowest-level sstables into fragments that are too 
small.

Once both of these are in place, we can set defaults that better suit all node 
densities, including 10 TiB and beyond, for example:
 - target size of 1 GiB
 - 𝜆 of 1/3
 - base shard count of 4
 - minimum size 100 MiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to