[
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015150#comment-13015150
]
Aaron Morton commented on CASSANDRA-2191:
-----------------------------------------
4) Sounds reasonable if throttling is on.
6) I'm not familiar with the bloom filter optimization you mentioned. However
it seems that more than anything else the major flag in doCompaction()
indicates if the compaction is running on all sstables, regardless of how the
process was triggered. i.e. the first ever minor compaction would also be
marked as major by this logic. PrecompactedRow and LazilyCompactedRow will
purge rows if the major flag is set or the key is only present in the sstables
under compaction. I'm not sure why the extra check is there for minor
compactions, but it looks like losing the fact the a major/manual compaction
was started could change the purge behaviour.
I'm also trying to understand if the isKeyInRemainingSSTables() in the
AbstractCompactedRow sub classes could be affected by multithreading. e.g. CF
with two buckets, high min compaction threshold so longer compaction, two
concurrent minor compactions one in each bucket, row A in both buckets, if
either thread processes row A before the other finishes it would stop that
thread purging the row, is there a race condition that stops both threads
purging the row?
9) We do not use the value in the compactions map, could we set it to the
current system time when beginCompaction() is called and use that to the sort
the list ? was not a biggie
> Multithread across compaction buckets
> -------------------------------------
>
> Key: CASSANDRA-2191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Stu Hood
> Assignee: Stu Hood
> Priority: Critical
> Labels: compaction
> Fix For: 0.8
>
> Attachments: 0001-Add-a-compacting-set-to-DataTracker.txt,
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt,
> 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt,
> 0004-Try-harder-to-close-scanners-in-compaction-close.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of
> sstables that existed the moment the compaction started. This means that for
> longer running compactions (even when running as fast as possible on the
> hardware), a very large number of new sstables might be created in the
> meantime. We have observed this proliferation of sstables killing performance
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing
> larger files) when compactions in lower buckets become possible. While this
> would likely solve the problem with read performance, it does not actually
> help us perform compaction any faster, which is a reasonable requirement for
> other situations.
> Instead, we need to be able to perform any compactions that are currently
> required in parallel, independent of what bucket they might be in.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira