[ 
https://issues.apache.org/jira/browse/HBASE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138848#comment-13138848
 ] 

Ted Yu commented on HBASE-4063:
-------------------------------

RegionLoad carries statistics about the region, such as the total size of the 
store files for the region, uncompressed, in MB.
We should utilize such information to form balanced region groups.
                
> Improve TableInputFormat to allow application to configure the number of 
> mappers
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-4063
>                 URL: https://issues.apache.org/jira/browse/HBASE-4063
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> TableInputFormat creates one split/mapper task per region. In the case of 
> lots of small regions, the overhead of map reduce framework becomes overhead. 
> There are some related work items that could address this issue.
> 1.    Reduce the number of small regions. 
> https://issues.apache.org/jira/browse/HBASE-420 
> 2.    Improvement in map reduce framework to handle small jobs. 
> https://issues.apache.org/jira/browse/MAPREDUCE-1220 
> Another quick way to solve this is to just improve TableInputFormat so that 
> it can pack a configurable number of regions from a given region server into 
> one mapper task. I tested this approach and was able to achieve 40% 
> improvement on map job latency.
> In addition, Ophir Cohen suggested support for multiple mappers per region as 
> below.
> On Thu, Jun 30, 2011 at 8:38 AM, Ophir Cohen <[email protected]> wrote:
> > Actually I thought of opposite version:
> > If I have a spare map slots why not configure it to run more than one mapper
> > on region?
> > The question then is how to 'skip' the mappers to the needed places inside
> > the regions.
> Well, the current splitter passed mappers Scans where the start/end
> rows are the region boundaries (at the time at which the splitter
> ran).
> To do your case,  in the splitter, you'd just give out multiple splits
> per region.  To cut up the region key-space, you might use the
> Bytes.split code.  It does coarse BigNumber math dividing the key
> space.  See here:
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/Bytes.html#1034
> St.Ack
> To support the scenarios of:
> a) One mapper for multiple regions.
> b) Multiple mappers for one region.
> We can modify TableInputFormat to allow application to config the number of 
> mappers. TableInputFormat will do the internal calculation to find out how to 
> config mappers' key range properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to