>> A few comments: >> - bulkload - you mean not by loading pre-created HFiles? If you do that there would be no compaction during the import as the files are simply moved >> into place.
Bulk load is not always convenient or feasible, we have batched mutations support in API but still compaction is serious issue. Cassandra allows to disable/enable compactions (I think its cluster-wide, not sure though), why do should not we have? >>- local compaction IO limit. Limiting the number of compaction threads (1 by default) is not good enough ... ? You can cause too much harm even with a >> single thread compacting per region server? This is I am not sure about myself. The idea is to make compaction more I/O nicer. For example, read operations and memstore flushes should have higher priority than compaction I/O. One way is to limit (throttle) compaction bandwidth locally, there are some other approaches as well. >>- rack IO throttle. We should add that to accommodate for over subscription at the ToR level. Can you decipher that, Lars? >>- cluster wide compaction storms. Yeah, that's bad. Can be alleviated by spreading timed major compactions out. (in our clusters we set the interval to 1 week and the jitter to 1/2 week) I think we have some JIRAs for that? >>- what do you think about off-peak compaction? We have that in part as the compaction ratio can be set differently for off peak hours Off peak compactions can have higher limits or even different policies. >>Generally I like the idea of being able to pace compaction better. >>Do you want to file jiras for these? Yeah, will do that. On Sat, Oct 4, 2014 at 10:31 AM, lars hofhansl <la...@apache.org> wrote: > Hi Vladimir, > > these are very interesting. > A few comments: > - bulkload - you mean not by loading pre-created HFiles? If you do that > there would be no compaction during the import as the files are simply > moved into place. > - local compaction IO limit. Limiting the number of compaction threads (1 > by default) is not good enough ... ? You can cause too much harm even with > a single thread compacting per region server? > > - rack IO throttle. We should add that to accommodate for over > subscription at the ToR level. > - cluster wide compaction storms. Yeah, that's bad. Can be alleviated by > spreading timed major compactions out. (in our clusters we set the interval > to 1 week and the jitter to 1/2 week) > - what do you think about off-peak compaction? We have that in part as the > compaction ratio can be set differently for off peak hours > > > Generally I like the idea of being able to pace compaction better. > Do you want to file jiras for these? Doesn't mean you have to do all the > work :) > > > -- Lars > > > > ________________________________ > From: Vladimir Rodionov <vladrodio...@gmail.com> > To: "dev@hbase.apache.org" <dev@hbase.apache.org> > Sent: Friday, October 3, 2014 10:34 PM > Subject: Compactions nice to have features > > > I am thinking about the following: > > 1. Compaction On/Off per CF, Table, cluster. Both: minor and major > > Good during bulk load. > > - Disable compaction for table 'T' > - Load 1B rows > - Enable compaction for table 'T' > > 2. Local Compaction I/O throttle > > Set I/O limit per RS > > 3. Rack Compaction I/O throttle > > Set I/O limit per server rack. Good to control uplink bandwidth. > > 4. Cluster Compaction I/O throttle. Good to avoid compaction storms > > -Vladimir Rodionov >