[
https://issues.apache.org/jira/browse/CASSANDRA-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei Deng updated CASSANDRA-12615:
---------------------------------
Labels: lcs (was: )
> Improve LCS compaction concurrency during L0->L1 compaction
> -----------------------------------------------------------
>
> Key: CASSANDRA-12615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12615
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction
> Reporter: Wei Deng
> Labels: lcs
> Attachments: L0_L1_inefficiency.jpg
>
>
> I've done multiple experiments with {{compaction-stress}} at 100GB, 200GB,
> 400GB and 600GB levels. These scenarios share a common pattern: at the
> beginning of the compaction they all overwhelm L0 with a lot (hundreds to
> thousands) of 128MB SSTables. One common observation I noticed from
> visualizing the compaction.log files from these tests is that initially after
> some massive STCS-in-L0 activities (could take up to 40% of total compaction
> time), L0->L1 always takes a really long time which frequently involves all
> of the bigger L0 SSTables (the results of many STCS compactions earlier) and
> all of the 10 L1 SSTables, and the output covers almost the full data set.
> Since L0->L1 can only happen single-threaded, we often spend close to 40% of
> the total compaction time in this L0->L1 stage, and only after this first
> really long L0->L1 finishes and 100s or 1000s of SSTables land on L1, can
> concurrent compactions at higher levels resume (to move the thousands of L1
> SSTables to higher levels). The attached snapshot demonstrates this
> observation.
> The question is, if this L0->L1 compaction is so big and can only happen
> single-threaded, and ends up generating thousands of L1 SSTables, most of
> which will have to up-level later anyway (as L1 can accommodate at most 10
> SSTables), why not start that L1+ up-level earlier, i.e. before this L0->L1
> compaction finishes.
> I can think of a few approaches: 1) break L0->L1 into smaller chunks if it
> realizes that the output of such L0->L1 compaction is going to far exceed the
> capacity of L1, this will allow each L0->L1 to finish sooner, and have the
> resulting L1 SSTables to be able to participate in higher up-level
> activities; 2) still treating the full L0->L1 as one big compaction session,
> but making the intermediate results (once the number of L1 SSTable output
> exceeds the L1 capacity) available for higher up-level activities.
> If we can somehow leverage more threads during this massive L0->L1 phase, we
> can save close to 40% of the total compaction time when L0 is initially
> backlogged, which will be a great improvement to our LCS compaction
> throughput.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)