[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Sylvain Lebresne (JIRA) Mon, 11 Apr 2011 02:14:03 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018251#comment-13018251
 ]


Sylvain Lebresne commented on CASSANDRA-2191:
---------------------------------------------

For the record:

bq. Inlined stopTheWorld in 0005. Yes, I agree that the name sucked, but 
whether or not it is possible for a lock acquisition to fail on a server that 
is not already screwed, and whether an abstraction is in order here is still up 
for debate

I do like the inlined version much more. I did not pretended that the previous 
version wasn't working. It was just hard to check that the umarking was 
happening correctly and even though I agree lock acquisition is unlikely to 
fail, it would have been easy for someone else to add lines inside stopTheWorld 
at the wrong place that could fail. And the name sucked :)

bq. Added an AtomicBoolean to AutoSavingCache in 0006. I reeeally think this 
should go to the flush stage, since the tasks have almost identical lifetimes, 
and we don't really need progress for either of them

I just don't want for cache saving to block flush too long. So I'm not saying 
it should not go to flush stage ever, but I'm inconfortable putting it there 
without some proper testing of its impact. We could make the flush stage 
multithreaded (with throttling), then I would have no problem with moving cache 
saving there (but then we would still have to make sure only one saving happen 
at a time).

> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-DataTracker.txt, 
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
> 0003-Expose-multiple-compactions-via-JMX-and-a-concrete-ser.txt, 
> 0004-Allow-multithread-compaction-to-be-disabled.txt, 
> 0005-Acquire-the-writeLock-for-major-cleanup-scrub-in-order.txt, 
> 0006-Prevent-cache-saves-from-occuring-concurrently.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of 
> sstables that existed the moment the compaction started. This means that for 
> longer running compactions (even when running as fast as possible on the 
> hardware), a very large number of new sstables might be created in the 
> meantime. We have observed this proliferation of sstables killing performance 
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing 
> larger files) when compactions in lower buckets become possible. While this 
> would likely solve the problem with read performance, it does not actually 
> help us perform compaction any faster, which is a reasonable requirement for 
> other situations.
> Instead, we need to be able to perform any compactions that are currently 
> required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Reply via email to