Hello everyone,

I would like to open the discussion on our proposal for a unified
compaction strategy that aims to solve well-known problems with compaction
and improve parallelism to permit higher levels of sustained write
throughput.

The proposal is here:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy

The strategy is based on two main observations:
- that tiered and levelled compaction can be generalized as the same thing
if one observes that both form exponentially-growing levels based on the
size of sstables (or non-overlapping sstable runs) and trigger a compaction
when more than a given number of sstables are present on one level;
- that instead of "size" in the description above we can use "density",
i.e. the size of an sstable divided by the width of the token range it
covers, which permits sstables to be split at arbitrary points when the
output of a compaction is written and still produce a levelled hierarchy.

The latter allows us to shard the compaction space into
progressively higher numbers of shards as data moves to the higher levels
of the hierarchy, improving parallelism, space requirements and the
duration of compactions, and the former allows us to cover the existing
strategies, as well as hybrid mixtures that can prove more efficient for
some workloads.

Thank you,
Branimir

Reply via email to