Wei Deng created CASSANDRA-12615:
------------------------------------

             Summary: Improve LCS compaction concurrency during L0->L1 
compaction
                 Key: CASSANDRA-12615
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12615
             Project: Cassandra
          Issue Type: Improvement
          Components: Compaction
            Reporter: Wei Deng
         Attachments: L0_L1_inefficiency.jpg

I've done multiple experiments with {{compaction-stress}} at 100GB, 200GB, 
400GB and 600GB levels. These scenarios share a common pattern: at the 
beginning of the compaction they all overwhelm L0 with a lot (hundreds to 
thousands) of 128MB SSTables. One common observation I noticed from visualizing 
the compaction.log files from these tests is that initially after some massive 
STCS-in-L0 activities (could take up to 40% of total compaction time), L0->L1 
always takes a really long time which frequently involves all of the bigger L0 
SSTables (the results of many STCS compactions earlier) and all of the 10 L1 
SSTables, and the output covers almost the full data set. Since L0->L1 can only 
happen single-threaded, we often spend close to 40% of the total compaction 
time in this L0->L1 stage, and only after this first really long L0->L1 
finishes and 100s or 1000s of SSTables land on L1, can concurrent compactions 
at higher levels resume (to move the thousands of L1 SSTables to higher 
levels). The attached snapshot demonstrates this observation.

The question is, if this L0->L1 compaction is so big and can only happen 
single-threaded, and ends up generating thousands of L1 SSTables, most of which 
will have to up-level later anyway (as L1 can accommodate at most 10 SSTables), 
why not start that L1+ up-level earlier, i.e. before this L0->L1 compaction 
finishes.

I can think of a few approaches: 1) break L0->L1 into smaller chunks if it 
realizes that the output of such L0->L1 compaction is going to far exceed the 
capacity of L1, this will allow each L0->L1 to finish sooner, and have the 
resulting L1 SSTables to be able to participate in higher up-level activities; 
2) still treating the full L0->L1 as one big compaction session, but making the 
intermediate results (once the number of L1 SSTable output exceeds the L1 
capacity) available for higher up-level activities.

If we can somehow leverage more threads during this massive L0->L1 phase, we 
can save close to 40% of the total compaction time when L0 is initially 
backlogged, which will be a great improvement to our LCS compaction throughput.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to