[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Aaron Morton (JIRA) Sun, 03 Apr 2011 07:20:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015150#comment-13015150
 ]


Aaron Morton commented on CASSANDRA-2191:
-----------------------------------------

4) Sounds reasonable if throttling is on.
 
6) I'm not familiar with the bloom filter optimization you mentioned. However 
it seems that more than anything else the major flag in doCompaction() 
indicates if the compaction is running on all sstables, regardless of how the 
process was triggered. i.e. the first ever minor compaction would also be 
marked as major by this logic. PrecompactedRow and LazilyCompactedRow will 
purge rows if the major flag is set or the key is only present in the sstables 
under compaction. I'm not sure why the extra check is there for minor 
compactions, but it looks like losing the fact the a major/manual compaction 
was started could change the purge behaviour. 

I'm also trying to understand if the isKeyInRemainingSSTables() in the 
AbstractCompactedRow sub classes could be affected by multithreading. e.g. CF 
with two buckets, high min compaction threshold so longer compaction, two 
concurrent minor compactions one in each bucket, row A in both buckets, if 
either thread processes row A before the other finishes it would stop that 
thread purging the row, is there a race condition that stops both threads 
purging the row?

9) We do not use the value in the compactions map, could we set it to the 
current system time when beginCompaction() is called and use that to the sort 
the list ? was not a biggie 



> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-DataTracker.txt, 
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
> 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt, 
> 0004-Try-harder-to-close-scanners-in-compaction-close.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of 
> sstables that existed the moment the compaction started. This means that for 
> longer running compactions (even when running as fast as possible on the 
> hardware), a very large number of new sstables might be created in the 
> meantime. We have observed this proliferation of sstables killing performance 
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing 
> larger files) when compactions in lower buckets become possible. While this 
> would likely solve the problem with read performance, it does not actually 
> help us perform compaction any faster, which is a reasonable requirement for 
> other situations.
> Instead, we need to be able to perform any compactions that are currently 
> required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Reply via email to