[ 
https://issues.apache.org/jira/browse/CASSANDRA-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4341:
----------------------------------------

    Attachment: 4341-fix.txt

This patch might make leveled compaction be stuck in an infinite compaction 
loop if compaction is used and no more data comes in.

The problem is that if you have say 2 sstable in L0, but those are not bigger 
than sstableMaxSize, we will compact them in L0 but we might end up with 2 
sstable in L0 instead of 1. Now the reason this can happen is due to another 
problem older than this patch. That problem is that when leveled compacts, it 
splits sstables at sstableMaxSize of *uncompressed* data. However 
LeveledManifest (the patch on this ticket included) consider the level sizes to 
be *on-disk sizes*. So 2 sstables can be less than 10MB of on-disk size, but 
when compacting them, they will still generate 2 sstables because the 
uncompressed size is > 10 MB.

In theory there is 2 possible fixes for that:
# when we compact, consider the on-disk size to split sstables.
# in LeveledManifest, consider level size in uncompressed data size instead of 
on-disk size.

I think the first solution is closer to the initial intention in that we want 
file on disk to be what the user sets with sstableMaxSize. Besides, doing the 
2nd solution means that we would artificially augment the size of all level, 
which would make the upgrade a bit painful since it would generate a lot of 
compaction to re-equilibrate levels.  So attaching patch that does the first 
idea. (I note that because our sequentialWriter buffer data before writing 
them, getting the on-disk file pointer give us a position aligned on buffer 
size, but I don't thing that matters in that case, except that it makes it an 
error to have a SequentialWriter buffer size > compression block size).

There was 2 other problem with the committed patch:
* The edge case where compaction candidates in L0 were exactly of 
sstableMaxSize was not handled correctly in that the candidates would not be 
compacted with L1 sstable but would still be promoted.
* In the case where we had MAX_COMPACTING_L0 candidates, the code wasn't adding 
the overlapping sstable from L1.
The attached patch fixes that too.

                
> Small SSTable Segments Can Hurt Leveling Process
> ------------------------------------------------
>
>                 Key: CASSANDRA-4341
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4341
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Benjamin Coverston
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: compaction
>             Fix For: 1.1.2
>
>         Attachments: 4341-fix.txt, 4341.txt
>
>
> This concerns:
> static int MAX_COMPACTING_L0 = 32;
> Repair can create very small SSTable segments. We should consider moving to a 
> threshold that takes into account the size of the files brought into 
> compaction rather than the number of files for this and similar situations. 
> Bringing the small files from L0 to L1 magnifies the issue.
> If there are too many very small files in L0 perhaps even an intermediate 
> compaction would even reduce the magnifying effect of a L0 to L1 compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to