Improve TableInputFormat to allow application to configure the number of mappers
--------------------------------------------------------------------------------

                 Key: HBASE-4063
                 URL: https://issues.apache.org/jira/browse/HBASE-4063
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
            Reporter: Ming Ma
            Assignee: Ming Ma


TableInputFormat creates one split/mapper task per region. In the case of lots 
of small regions, the overhead of map reduce framework becomes overhead. There 
are some related work items that could address this issue.

1.      Reduce the number of small regions. 
https://issues.apache.org/jira/browse/HBASE-420 
2.      Improvement in map reduce framework to handle small jobs. 
https://issues.apache.org/jira/browse/MAPREDUCE-1220 

Another quick way to solve this is to just improve TableInputFormat so that it 
can pack a configurable number of regions from a given region server into one 
mapper task. I tested this approach and was able to achieve 40% improvement on 
map job latency.


In addition, Ophir Cohen suggested support for multiple mappers per region as 
below.

On Thu, Jun 30, 2011 at 8:38 AM, Ophir Cohen <[email protected]> wrote:
> Actually I thought of opposite version:
> If I have a spare map slots why not configure it to run more than one mapper
> on region?
> The question then is how to 'skip' the mappers to the needed places inside
> the regions.

Well, the current splitter passed mappers Scans where the start/end
rows are the region boundaries (at the time at which the splitter
ran).

To do your case,  in the splitter, you'd just give out multiple splits
per region.  To cut up the region key-space, you might use the
Bytes.split code.  It does coarse BigNumber math dividing the key
space.  See here:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/Bytes.html#1034

St.Ack


To support the scenarios of:
a) One mapper for multiple regions.
b) Multiple mappers for one region.


We can modify TableInputFormat to allow application to config the number of 
mappers. TableInputFormat will do the internal calculation to find out how to 
config mappers' key range properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to