[
https://issues.apache.org/jira/browse/HBASE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-4063.
----------------------------------------
Assignee: (was: Ming Ma)
Resolution: Incomplete
> Improve TableInputFormat to allow application to configure the number of
> mappers
> --------------------------------------------------------------------------------
>
> Key: HBASE-4063
> URL: https://issues.apache.org/jira/browse/HBASE-4063
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Ming Ma
> Priority: Major
>
> TableInputFormat creates one split/mapper task per region. In the case of
> lots of small regions, the overhead of map reduce framework becomes overhead.
> There are some related work items that could address this issue.
> 1. Reduce the number of small regions.
> https://issues.apache.org/jira/browse/HBASE-420
> 2. Improvement in map reduce framework to handle small jobs.
> https://issues.apache.org/jira/browse/MAPREDUCE-1220
> Another quick way to solve this is to just improve TableInputFormat so that
> it can pack a configurable number of regions from a given region server into
> one mapper task. I tested this approach and was able to achieve 40%
> improvement on map job latency.
> In addition, Ophir Cohen suggested support for multiple mappers per region as
> below.
> On Thu, Jun 30, 2011 at 8:38 AM, Ophir Cohen <[email protected]> wrote:
> > Actually I thought of opposite version:
> > If I have a spare map slots why not configure it to run more than one mapper
> > on region?
> > The question then is how to 'skip' the mappers to the needed places inside
> > the regions.
> Well, the current splitter passed mappers Scans where the start/end
> rows are the region boundaries (at the time at which the splitter
> ran).
> To do your case, in the splitter, you'd just give out multiple splits
> per region. To cut up the region key-space, you might use the
> Bytes.split code. It does coarse BigNumber math dividing the key
> space. See here:
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/Bytes.html#1034
> St.Ack
> To support the scenarios of:
> a) One mapper for multiple regions.
> b) Multiple mappers for one region.
> We can modify TableInputFormat to allow application to config the number of
> mappers. TableInputFormat will do the internal calculation to find out how to
> config mappers' key range properly.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)