[
https://issues.apache.org/jira/browse/CASSANDRA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-1881.
---------------------------------------
Resolution: Won't Fix
Concurrent compactions was added in CASSANDRA-2191. I see small benefit (and a
lot of complexity) to be gained by rewriting to basically a pool of async
compaction threads.
> support concurrent "tiered" compaction
> --------------------------------------
>
> Key: CASSANDRA-1881
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1881
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Peter Schuller
> Priority: Minor
>
> (this has been discussed on the ML:s before; I am filing it now so that there
> is a ticket to refer to on the wiki)
> CASSANDRA-1876 is open to allow parallel compaction for the purpose of
> throughput. However, that only addresses one aspect of why parallel
> compaction is useful; the other half is ensuring that compaction is
> proceeding in a timely fashion at each "size tier" (for lack of a better
> term).
> Essentially, CASSANDRA-1876 is about CPU concurrency while this is about
> functional concurrency. I propose that compaction be a process which performs
> some amount of compaction work per second (I'm thinking ahead to future rate
> limiting; that's another ticket to be filed). That work has to be spread out
> over multiple compaction tiers in a way that is not coupled with CPU
> concurrency.
> Suggested solution is to have N number of concurrent compaction threads going
> at any given moment (CASSANDRA-1876), but to have those compaction threads
> perform work for a variable number of compaction jobs. Compactions would be
> triggered according to similarly sized sstables as before, but each such
> compaction would be a compaction "job" that is independent of any actual
> compaction thread.
> Compaction threads move between compaction jobs at a coarse granularity so
> that synchronization overhead is irrelevant (for example it might go and look
> for other work to do every memtable_throughput_in_mb megabytes). Smaller
> compaction jobs take priority over larger jobs. This is intended to keep
> sstable counts down, and always leave the larger jobs as the ones having to
> wait given that they are not latency sensitive anyway due to their size.
> The primary downside is that disk usage spikes would much more easily reach
> "double cf size" levels when many compactions are running. This is probably
> something that can be mitigated by CASSANDRA-1608 with its talk of limited
> sstable sizes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira