[ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-2191:
--------------------------------

    Attachment: 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt
                0001-Add-a-compacting-set-to-sstabletracker.txt

Patch to add a "compacting" set to the SSTableTracker which is atomically 
modified to schedule compactions. SSTables are removed from the compacting set 
in a finally block.

Also, converts the "compactionLock", which is only used by migrations (to 
completely stop compactions), to a read-write lock. Running compactions acquire 
as readers, migrations acquire as writer.

Implications: up to #num-procs compactions will run at once, possibly within 
the same bucket, but likely in different buckets.

This patch goes hand in hand with CASSANDRA-2156, which ensures that despite 
our multithreading, we don't trample other operations on the system.

> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of 
> sstables that existed the moment the compaction started. This means that for 
> longer running compactions (even when running as fast as possible on the 
> hardware), a very large number of new sstables might be created in the 
> meantime. We have observed this proliferation of sstables killing performance 
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing 
> larger files) when compactions in lower buckets become possible. While this 
> would likely solve the problem with read performance, it does not actually 
> help us perform compaction any faster, which is a reasonable requirement for 
> other situations.
> Instead, we need to be able to perform any compactions that are currently 
> required in parallel, independent of what bucket they might be in.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to