[jira] [Commented] (CASSANDRA-4310) Multiple independent Level Compactions in Parallel

Jonathan Ellis (JIRA) Fri, 21 Sep 2012 08:58:10 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460580#comment-13460580
 ]


Jonathan Ellis commented on CASSANDRA-4310:
-------------------------------------------

bq. If there are 2 or more sets to compact, then ParallelLeveledCompactionTask 
gets created with its own executor, and performs compaction in parallel

This looks unnecessary, we already have CompactionExecutor with the correct 
({{concurrent_compactor}}) number of threads created.

Unfortunately I think I took a step backwards in CASSANDRA-2407, which is where 
we changed the API from {{List<CompactionTask> getBackgroundTasks}} to 
{{CompactionTask getNextBackgroundTask}}.

The latter made things more serial deliberately, since 2407 was trying to make 
STCS finish off small buckets, before working on larger ones.  But this now 
looks like optimizing for the wrong thing.

That said, I'm not sure we need to switch back to the {{List}} api, since 
either way we need to make the candidate generation aware of what is already 
being compacted (since submitBackground can get called for multiple flushes 
before it's done with the first set).  So what I would propose is, make 
CM.submitBackground loop until

- there are no more idle executor threads, or
- gNBT returns null

                
> Multiple independent Level Compactions in Parallel
> --------------------------------------------------
>
>                 Key: CASSANDRA-4310
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4310
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: sankalp kohli
>            Assignee: Yuki Morishita
>              Labels: compaction, features, leveled, performance, ssd
>             Fix For: 1.2.1
>
>         Attachments: 4310.txt
>
>
> Problem: If you are inserting data into cassandra and level compaction cannot 
> catchup, you will create lot of files in L0.  
> Here is a solution which will help here and also increase the performance of 
> level compaction.
> We can do many compactions in parallel for unrelated data.
> 1) For no over lapping levels. Ex: when L0 stable is compacting with L1, we 
> can do compactions in other levels like L2 and L3 if they are eligible.
> 2) We can also do compactions with files in L1 which are not participating in 
> L0 compactions.
> This is specially useful if you are using SSD and is not bottlenecked by IO. 
> I am seeing this issue in my cluster. The compactions pending are more than 
> 50k and the disk usage is not that much(I am using SSD).
> I am doing multithreaded to true and also not throttling the IO by putting 
> the value as 0. 
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4310) Multiple independent Level Compactions in Parallel

Reply via email to