Wei Deng created CASSANDRA-12526:
------------------------------------

             Summary: For LCS, single SSTable up-level is handled inefficiently
                 Key: CASSANDRA-12526
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12526
             Project: Cassandra
          Issue Type: Bug
          Components: Compaction
            Reporter: Wei Deng


I'm using the latest trunk (as of August 2016, which probably is going to be 
3.10) to run some experiments on LeveledCompactionStrategy and noticed this 
inefficiency.

The test data is generated using cassandra-stress default parameters 
(keyspace1.standard1), so as you can imagine, it consists of a ton of newly 
inserted partitions that will never merge in compactions, which is probably the 
worst kind of workload for LCS (however, I'll detail later why this scenario 
should not be ignored as a corner case; for now, let's just assume we still 
want to handle this scenario efficiently).

After the compaction test is done, I scrubbed debug.log for patterns that match 
 the "Compacted" summary so that I can see how long each individual compaction 
took and how many bytes they processed. The search pattern is like the 
following:

{noformat}
grep 'Compacted.*standard1' debug.log
{noformat}

Interestingly, I noticed a lot of the finished compactions are marked as having 
*only one* SSTable involved. With the workload mentioned above, the "single 
SSTable" compactions actually consist of the majority of all compactions (as 
shown below), so its efficiency can affect the overall compaction throughput 
quite a bit.

{noformat}
automaton@0ce59d338-1:~/cassandra-trunk/logs$ grep 'Compacted.*standard1' 
debug.log-test1 | wc -l
243
automaton@0ce59d338-1:~/cassandra-trunk/logs$ grep 'Compacted.*standard1' 
debug.log-test1 | grep ") 1 sstable" | wc -l
218
{noformat}

By looking at the code, it appears that there's a way to directly edit the 
level of a particular SSTable like the following:

{code}
sstable.descriptor.getMetadataSerializer().mutateLevel(sstable.descriptor, 
targetLevel);
sstable.reloadSSTableMetadata();
{code}

Compared to what we have now (reading the whole single-SSTable from old level 
and writing out the same single-SSTable at the new level), the only difference 
I could think of by using this approach is that the new SSTable will have the 
same file name (sequence number) as the old one's, which could break some 
assumptions on some other part of the code. However, not having to go through 
the full read/write IO, and not having to bear the overhead of cleaning up the 
old file, creating the new file, creating more churns in heap and file buffer, 
it seems the benefits outweigh the inconvenience. So I'd argue this JIRA 
belongs to LHF and should be made available in 3.0.x as well.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to