Björn Hegerfors created CASSANDRA-9013:
------------------------------------------

             Summary: Add new option making DTCS unify larger time windows 
sooner
                 Key: CASSANDRA-9013
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9013
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Björn Hegerfors
            Priority: Minor


In my very long post on CASSANDRA-6602, I mentioned a more aggressive windowing 
strategy, which looks for opportunities to compact into larger SSTables sooner. 
The original approach was that when we have min_threshold windows of the same 
size and another one of the same size appears next to them, those windows (not 
including the newest addition) merge. This new approach doesn't wait for a 
(min_threshold+1)th one. As soon as min_threshold windows of one size are 
created, they merge at once. The only exception is the "incoming window", which 
stays outside of merging with other windows until it is no longer the incoming 
window.

This does mean that occasionally more than min_threshold SSTables, not all of 
similar size get compacted, intentionally. For example, let's say min_threshold 
is 4, then if we have 3 windows size 16, 3 windows size 4 and just get a 4th 
size 1 window that isn't the incoming window, we immediately merge all of those 
into a size 64 window. Typically we expect one SSTable to be in each window 
with a file size corresponding to the window size in some unit of measure. So 
we merge roughly 10 SSTables in that scenario.

These bigger compactions happen rarely, about as often as a similar thing 
happens in STCS (on occasion the number of SSTables gets very small). This 
tweak to DTCS is meant to mimic that behavior in STCS. It has been observed 
that DTCS typically has 50% to 100% more SSTables than STCS, so this is a way 
to counter that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to