[ 
https://issues.apache.org/jira/browse/CASSANDRA-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-12615:
---------------------------------
    Labels: lcs  (was: )

> Improve LCS compaction concurrency during L0->L1 compaction
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12615
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12615
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Wei Deng
>              Labels: lcs
>         Attachments: L0_L1_inefficiency.jpg
>
>
> I've done multiple experiments with {{compaction-stress}} at 100GB, 200GB, 
> 400GB and 600GB levels. These scenarios share a common pattern: at the 
> beginning of the compaction they all overwhelm L0 with a lot (hundreds to 
> thousands) of 128MB SSTables. One common observation I noticed from 
> visualizing the compaction.log files from these tests is that initially after 
> some massive STCS-in-L0 activities (could take up to 40% of total compaction 
> time), L0->L1 always takes a really long time which frequently involves all 
> of the bigger L0 SSTables (the results of many STCS compactions earlier) and 
> all of the 10 L1 SSTables, and the output covers almost the full data set. 
> Since L0->L1 can only happen single-threaded, we often spend close to 40% of 
> the total compaction time in this L0->L1 stage, and only after this first 
> really long L0->L1 finishes and 100s or 1000s of SSTables land on L1, can 
> concurrent compactions at higher levels resume (to move the thousands of L1 
> SSTables to higher levels). The attached snapshot demonstrates this 
> observation.
> The question is, if this L0->L1 compaction is so big and can only happen 
> single-threaded, and ends up generating thousands of L1 SSTables, most of 
> which will have to up-level later anyway (as L1 can accommodate at most 10 
> SSTables), why not start that L1+ up-level earlier, i.e. before this L0->L1 
> compaction finishes.
> I can think of a few approaches: 1) break L0->L1 into smaller chunks if it 
> realizes that the output of such L0->L1 compaction is going to far exceed the 
> capacity of L1, this will allow each L0->L1 to finish sooner, and have the 
> resulting L1 SSTables to be able to participate in higher up-level 
> activities; 2) still treating the full L0->L1 as one big compaction session, 
> but making the intermediate results (once the number of L1 SSTable output 
> exceeds the L1 capacity) available for higher up-level activities.
> If we can somehow leverage more threads during this massive L0->L1 phase, we 
> can save close to 40% of the total compaction time when L0 is initially 
> backlogged, which will be a great improvement to our LCS compaction 
> throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to