Hi Vladimir,

these are very interesting.
A few comments:
- bulkload - you mean not by loading pre-created HFiles? If you do that there 
would be no compaction during the import as the files are simply moved into 
place.
- local compaction IO limit. Limiting the number of compaction threads (1 by 
default) is not good enough ... ? You can cause too much harm even with a 
single thread compacting per region server?

- rack IO throttle. We should add that to accommodate for over subscription at 
the ToR level.
- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by 
spreading timed major compactions out. (in our clusters we set the interval to 
1 week and the jitter to 1/2 week)
- what do you think about off-peak compaction? We have that in part as the 
compaction ratio can be set differently for off peak hours


Generally I like the idea of being able to pace compaction better.
Do you want to file jiras for these? Doesn't mean you have to do all the work :)


-- Lars



________________________________
 From: Vladimir Rodionov <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, October 3, 2014 10:34 PM
Subject: Compactions nice to have features
 

I am thinking about the following:

1. Compaction On/Off per CF, Table, cluster. Both: minor and major

Good during bulk load.

- Disable compaction for table 'T'
- Load 1B rows
- Enable compaction for table 'T'

2. Local Compaction I/O throttle

Set I/O limit per RS

3. Rack Compaction I/O throttle

Set I/O limit per server rack. Good to control uplink bandwidth.

4. Cluster Compaction I/O throttle. Good to avoid compaction storms

-Vladimir Rodionov

Reply via email to