Compaction Throttling
---------------------

                 Key: CASSANDRA-2156
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Stu Hood
             Fix For: 0.8


Compaction is currently relatively bursty: we compact as fast as we can, and 
then we wait for the next compaction to be possible ("hurry up and wait").

Instead, to properly amortize compaction, you'd like to compact exactly as fast 
as you need to to keep the sstable count under control.

For every new level of compaction, you need to increase the rate that you 
compact at: a rule of thumb that we're testing on our clusters is to determine 
the maximum number of buckets a node can support (aka, if the 15th bucket holds 
750 GB, we're not going to have more than 15 buckets), and then multiply the 
flush throughput by the number of buckets to get a minimum compaction 
throughput to maintain your sstable count.

Full explanation: for a min compaction threshold of {{T}}, the bucket at level 
{{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on 
disk). Every time a new unit is added, it has a {{1/SsubN}} chance of causing 
the bucket at level N to fill. If the bucket at level N fills, it causes 
{{SsubN}} units to be compacted. So, for each active level in your system you 
have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any time a new 
unit is added.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to