[
https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562374#action_12562374
]
Billy Pearson commented on HADOOP-2615:
---------------------------------------
Thats what I see too the split never happens when a region is under load of
inserts. I still thank if we are going to have transactions speed close to
bigtables we will need to add a limit on number of map files to compaction at
one time.
Even if HADOOP-2636 get the flushing working right for performance point of
view I thank it should be included as any ways to handle large number of
regions per server.
I am seeing 10-15 mins to run compaction on a 90MB region using block
compression.
So if you consider that most will want to handed more then 25-50 regions per
server.
Say avg region server holds 100 regions thats going to work out to be
100*10mins = 1000 mins = 16hours to run a full compaction on all the regions.
By havening this in place on regions getting large update traffic the map files
will not get out of control.
100 regions with 90MB avg size only equals about 9GB of compressed data.
I would like to see closer to production release better compression method
used.
This would help with compaction speed right now my bottle neck on compaction is
compression.
{New Idea}
After thinking on this a little not sure doing a compaction on the number of
map files it the best way to go.
Compaction on 3-6 small 1-2mb map files does not take that long even with
compression so the idea way to do this would be to only
compaction small files while we have small files to compaction leaving more
larger map files to compact in the end when load is as high.
big tables has the right idea only do a full/major compaction of all the map
files every so often to remove deleted data or data out of its max version
range.
so we might want to look at the idea of removing the compaction based on the
number of map files to a limit on the size of the map files
example say we have a region family compaction max size 16MB we could only
compact files under that size once we compact regions that total more then the
max compaction size then do not include that map file in the next compaction.
This would leave map files around the same size to be compacted together say
once a day and/or after splits.
also I would like to keep the region server handle the compaction on there own
so the master can be left alone to do other more important task.
Currently if you load a region server with many regions it always be running
compaction's on the regions if there getting data inserted.
So this would lesses the load on the hard drives, memory, and cpus giving more
resources for faster/more transactions.
> Add max number of mapfiles to compact at one time giveing us a minor & major
> compaction
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-2615
> URL: https://issues.apache.org/jira/browse/HADOOP-2615
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hbase
> Reporter: Billy Pearson
> Priority: Minor
> Fix For: 0.17.0
>
> Attachments: flag.patch, twice.patch
>
>
> Currently we do compaction on a region when the
> hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time
> simulator to doing a minor compaction in bigtable. This keep compaction's
> form getting tied up in one region to long letting other regions get way to
> many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will
> eventuly include all mapfiles causeing a major compaction on that region.
> Unlike big table this would leave the master out of the process and letting
> the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the
> newest mapfiles first leave the larger/older ones for when we have low
> updates to a region.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.