Re: Compactions nice to have features

Vladimir Rodionov Sun, 05 Oct 2014 14:12:59 -0700

>> A few comments:
>> - bulkload - you mean not by loading pre-created HFiles? If you do that
there would be no compaction during the import as the files are simply
moved
>> into place.


Bulk load is not always convenient or feasible, we have batched mutations
support in API but still compaction is serious issue. Cassandra allows to
disable/enable compactions (I think its cluster-wide, not sure though), why
do should not we have?

>>- local compaction IO limit. Limiting the number of compaction threads (1
by default) is not good enough ... ? You can cause too much harm even with
a >> single thread compacting per region server?

This is I am not sure about myself. The idea is to make compaction more I/O
nicer. For example, read operations and memstore flushes  should have
higher priority than compaction I/O. One way is to limit (throttle)
compaction bandwidth locally, there are some other approaches as well.

>>- rack IO throttle. We should add that to accommodate for over
subscription at the ToR level.

Can you decipher that, Lars?

>>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
spreading timed major compactions out. (in our clusters we set the interval
to 1 week and the jitter to 1/2 week)

I think we have some JIRAs for that?

>>- what do you think about off-peak compaction? We have that in part as
the compaction ratio can be set differently for off peak hours

Off peak compactions can have higher limits or even different policies.

>>Generally I like the idea of being able to pace compaction better.
>>Do you want to file jiras for these?
Yeah, will do that.



On Sat, Oct 4, 2014 at 10:31 AM, lars hofhansl <[email protected]> wrote:

> Hi Vladimir,
>
> these are very interesting.
> A few comments:
> - bulkload - you mean not by loading pre-created HFiles? If you do that
> there would be no compaction during the import as the files are simply
> moved into place.
> - local compaction IO limit. Limiting the number of compaction threads (1
> by default) is not good enough ... ? You can cause too much harm even with
> a single thread compacting per region server?
>
> - rack IO throttle. We should add that to accommodate for over
> subscription at the ToR level.
> - cluster wide compaction storms. Yeah, that's bad. Can be alleviated by
> spreading timed major compactions out. (in our clusters we set the interval
> to 1 week and the jitter to 1/2 week)
> - what do you think about off-peak compaction? We have that in part as the
> compaction ratio can be set differently for off peak hours
>
>
> Generally I like the idea of being able to pace compaction better.
> Do you want to file jiras for these? Doesn't mean you have to do all the
> work :)
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Vladimir Rodionov <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Friday, October 3, 2014 10:34 PM
> Subject: Compactions nice to have features
>
>
> I am thinking about the following:
>
> 1. Compaction On/Off per CF, Table, cluster. Both: minor and major
>
> Good during bulk load.
>
> - Disable compaction for table 'T'
> - Load 1B rows
> - Enable compaction for table 'T'
>
> 2. Local Compaction I/O throttle
>
> Set I/O limit per RS
>
> 3. Rack Compaction I/O throttle
>
> Set I/O limit per server rack. Good to control uplink bandwidth.
>
> 4. Cluster Compaction I/O throttle. Good to avoid compaction storms
>
> -Vladimir Rodionov
>

Re: Compactions nice to have features

Reply via email to