Björn Hegerfors created CASSANDRA-9013:
------------------------------------------
Summary: Add new option making DTCS unify larger time windows
sooner
Key: CASSANDRA-9013
URL: https://issues.apache.org/jira/browse/CASSANDRA-9013
Project: Cassandra
Issue Type: Improvement
Reporter: Björn Hegerfors
Priority: Minor
In my very long post on CASSANDRA-6602, I mentioned a more aggressive windowing
strategy, which looks for opportunities to compact into larger SSTables sooner.
The original approach was that when we have min_threshold windows of the same
size and another one of the same size appears next to them, those windows (not
including the newest addition) merge. This new approach doesn't wait for a
(min_threshold+1)th one. As soon as min_threshold windows of one size are
created, they merge at once. The only exception is the "incoming window", which
stays outside of merging with other windows until it is no longer the incoming
window.
This does mean that occasionally more than min_threshold SSTables, not all of
similar size get compacted, intentionally. For example, let's say min_threshold
is 4, then if we have 3 windows size 16, 3 windows size 4 and just get a 4th
size 1 window that isn't the incoming window, we immediately merge all of those
into a size 64 window. Typically we expect one SSTable to be in each window
with a file size corresponding to the window size in some unit of measure. So
we merge roughly 10 SSTables in that scenario.
These bigger compactions happen rarely, about as often as a similar thing
happens in STCS (on occasion the number of SSTables gets very small). This
tweak to DTCS is meant to mimic that behavior in STCS. It has been observed
that DTCS typically has 50% to 100% more SSTables than STCS, so this is a way
to counter that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)