The speed of compaction isn't the problem.  The problem is that lots of reads 
and writes cause compaction to fall behind.

You could solve the problem by throttling reads and writes so compaction isn't 
starved.  (Maybe just the writes.  I'm not sure.)

Different nodes will have different compaction backlogs, so you'd want to do 
this on a per node basis after Cassandra has made decisions about whatever 
replication it's going to do.  For example, Cassandra could observe the number 
of pending compaction tasks and sleep that many milliseconds before every read 
and write.

The status quo is that I have to count a load test as passing only if the 
amount of backlogged compaction work stays less than some bound.  I'd rather 
not have to peer into Cassandra internals to determine whether it's really 
working or not.  It's a problem if 16 hour load tests get different results 
than 1 hour load tests because in my tests I'm renting a cluster by the hour.

Tim Freeman
Email: [email protected]
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and 
Thursday; call my desk instead.)

-----Original Message-----
From: Jonathan Ellis [mailto:[email protected]] 
Sent: Thursday, December 03, 2009 3:06 PM
To: [email protected]
Subject: Re: Persistently increasing read latency

Thanks for looking into this.  Doesn't seem like there's much
low-hanging fruit to make compaction faster but I'll keep that in the
back of my mind.

-Jonathan

On Thu, Dec 3, 2009 at 4:58 PM, Freeman, Tim <[email protected]> wrote:
>>So this is working as designed, but the design is poor because it
>>causes confusion.  If you can open a ticket for this that would be
>>great.
>
> Done, see:
>
>   https://issues.apache.org/jira/browse/CASSANDRA-599
>
>>What does iostat -x 10 (for instance) say about the disk activity?
>
> rkB/s is consistently high, and wkB/s varies.  This is a typical entry with 
> wkB/s at the high end of its range:
>
>>avg-cpu:  %user   %nice    %sys %iowait   %idle
>>           1.52    0.00    1.70   27.49   69.28
>>
>>Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s 
>>avgrq-sz avgqu-sz   await  svctm  %util
>>sda          3.10 3249.25 124.08 29.67 26299.30 26288.11 13149.65 13144.06   
>>342.04    17.75   92.25   5.98  91.92
>>sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
>>0.00     0.00    0.00   0.00   0.00
>>sda2         3.10 3249.25 124.08 29.67 26299.30 26288.11 13149.65 13144.06   
>>342.04    17.75   92.25   5.98  91.92
>>sda3         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
>>0.00     0.00    0.00   0.00   0.00
>
> and at the low end:
>
>>avg-cpu:  %user   %nice    %sys %iowait   %idle
>>           1.50    0.00    1.77   25.80   70.93
>>
>>Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s 
>>avgrq-sz avgqu-sz   await  svctm  %util
>>sda          3.40 817.10 128.60 17.70 27828.80 6600.00 13914.40  3300.00   
>>235.33     6.13   56.63   6.21  90.81
>>sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
>>0.00     0.00    0.00   0.00   0.00
>>sda2         3.40 817.10 128.60 17.70 27828.80 6600.00 13914.40  3300.00   
>>235.33     6.13   56.63   6.21  90.81
>>sda3         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
>>0.00     0.00    0.00   0.00   0.00
>
> Tim Freeman
> Email: [email protected]
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and 
> Thursday; call my desk instead.)
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:[email protected]]
> Sent: Thursday, December 03, 2009 2:45 PM
> To: [email protected]
> Subject: Re: Persistently increasing read latency
>
> On Thu, Dec 3, 2009 at 4:34 PM, Freeman, Tim <[email protected]> wrote:
>>>Can you tell if the system is i/o or cpu bound during compaction?
>>
>> It's I/O bound.  It's using ~9% of 1 of 4 cores as I watch it, and all it's 
>> doing right now is compactions.
>
> What does iostat -x 10 (for instance) say about the disk activity?
>

Reply via email to