Making compaction parallel isn't a priority because the problem is
almost always the opposite: how do we spread it out over a longer
period of time instead of sharp spikes of activity that hurt
read/write latency.  I'd be very surprised if latency would be
acceptable if you did have parallel compaction.  In other words, your
real problem is you need more capacity for your workload.

On Thu, Nov 25, 2010 at 5:18 PM, Carlos Alvarez <cbalva...@gmail.com> wrote:
>> When you say that it grows constantly, does that mean up to 30 or even
>> farther?
>
> My total data size is 2TB
>
> Actually, I never see the count stable.  When it reached 30 I thinked
> "I am reaching the default upper limit for a compaction, something
> went wrong" and I went back to 1GB memtables (also, I saw bigger read
> latencies).
>
> Well, I think you are right: I am CPU bounded on compaction, because I
> see during compactions a single jvm thread which is almost all the
> time in running state and the disk is not used beyond 50%.
>
>
>> (A nice future improvement would be to allow for concurrent compaction
>> so that Cassandra would be able to utilize multiple CPU cores which
>> may mitigate this if you have left-over CPU. However, this is not
>> currently supported.)
>
> Yes, sure. I'd be happy to test, but I don't dare to alter the code :-)
>
> I think that a partial solution would help: if the compaction
> compacted to 'n' diferents new sstables (not one), the implementation
> would be easier. I mean, the compaction would compact, for instance,
> 10 sstables to 2 (being 2 the level of paralelism). In this way, the
> sstables count would remain eventually stable (although higher). What
> do you think?
>
>
> Carlos.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to