[ 
https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562374#action_12562374
 ] 

Billy Pearson commented on HADOOP-2615:
---------------------------------------

Thats what I see too the split never happens when a region is under load of 
inserts. I still thank if we are going to have transactions speed close to 
bigtables we will need to add a limit on number of map files to compaction at 
one time. 
Even if HADOOP-2636 get the flushing working right for performance point of 
view I thank it should be included as any ways to handle large number of 
regions per server. 

I am seeing 10-15 mins to run compaction on a 90MB region using block 
compression. 
So if you consider that most will want to handed more then 25-50 regions per 
server.

Say avg region server holds 100 regions thats going to work out to be 
100*10mins = 1000 mins = 16hours to run a full compaction on all the regions.
By havening this in place on regions getting large update traffic the map files 
will not get out of control.

100 regions with 90MB avg size only equals about 9GB of compressed data.
I would like to see closer to production release better compression method 
used. 
This would help with compaction speed right now my bottle neck on compaction is 
compression.

{New Idea}
After thinking on this a little not sure doing a compaction on the number of 
map files it the best way to go.
Compaction on 3-6 small 1-2mb map files does not take that long even with 
compression so the idea way to do this would be to only 
compaction small files while we have small files to compaction leaving more 
larger map files to compact in the end when load is as high.

big tables has the right idea only do a full/major compaction of all the map 
files every so often to remove deleted data or data out of its max version 
range.
so we might want to look at the idea of removing the compaction based on the 
number of map files to a limit on the size of the map files
example say we have a region family compaction max size 16MB we could only 
compact files under that size once we compact regions that total more then the 
max compaction size then do not include that map file in the next compaction. 
This would leave map files around the same size to be compacted together say 
once a day and/or after splits.
also I would like to keep the region server handle the compaction on there own 
so the master can be left alone to do other more important task.

Currently if you load a region server with many regions it always be running 
compaction's on the regions if there getting data inserted.
So this would lesses the load on the hard drives, memory, and cpus giving more 
resources for faster/more  transactions.

> Add max number of mapfiles to compact at one time giveing us a minor & major 
> compaction
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2615
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2615
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: flag.patch, twice.patch
>
>
> Currently we do compaction on a region when the 
> hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time 
> simulator to doing a minor compaction in bigtable. This keep compaction's 
> form getting tied up in one region to long letting other regions get way to 
> many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will 
> eventuly include all mapfiles causeing a major compaction on that region. 
> Unlike big table this would leave the master out of the process and letting 
> the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the 
> newest mapfiles first leave the larger/older ones for when we have low 
> updates to a region.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to