[ 
https://issues.apache.org/jira/browse/HBASE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-4063.
----------------------------------------
      Assignee:     (was: Ming Ma)
    Resolution: Incomplete

> Improve TableInputFormat to allow application to configure the number of 
> mappers
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-4063
>                 URL: https://issues.apache.org/jira/browse/HBASE-4063
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Ming Ma
>            Priority: Major
>
> TableInputFormat creates one split/mapper task per region. In the case of 
> lots of small regions, the overhead of map reduce framework becomes overhead. 
> There are some related work items that could address this issue.
> 1.    Reduce the number of small regions. 
> https://issues.apache.org/jira/browse/HBASE-420 
> 2.    Improvement in map reduce framework to handle small jobs. 
> https://issues.apache.org/jira/browse/MAPREDUCE-1220 
> Another quick way to solve this is to just improve TableInputFormat so that 
> it can pack a configurable number of regions from a given region server into 
> one mapper task. I tested this approach and was able to achieve 40% 
> improvement on map job latency.
> In addition, Ophir Cohen suggested support for multiple mappers per region as 
> below.
> On Thu, Jun 30, 2011 at 8:38 AM, Ophir Cohen <[email protected]> wrote:
> > Actually I thought of opposite version:
> > If I have a spare map slots why not configure it to run more than one mapper
> > on region?
> > The question then is how to 'skip' the mappers to the needed places inside
> > the regions.
> Well, the current splitter passed mappers Scans where the start/end
> rows are the region boundaries (at the time at which the splitter
> ran).
> To do your case,  in the splitter, you'd just give out multiple splits
> per region.  To cut up the region key-space, you might use the
> Bytes.split code.  It does coarse BigNumber math dividing the key
> space.  See here:
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/Bytes.html#1034
> St.Ack
> To support the scenarios of:
> a) One mapper for multiple regions.
> b) Multiple mappers for one region.
> We can modify TableInputFormat to allow application to config the number of 
> mappers. TableInputFormat will do the internal calculation to find out how to 
> config mappers' key range properly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to