Modify TableInputFormat splitting algorithm to allow any number of mappers
--------------------------------------------------------------------------
Key: HBASE-1172
URL: https://issues.apache.org/jira/browse/HBASE-1172
Project: Hadoop HBase
Issue Type: Improvement
Components: mapred
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Fix For: 0.19.1, 0.20.0
Currently, the number of mappers specified when using TableInputFormat is
strictly followed if less than total regions on the input table. If greater,
the number of regions is used.
This will modify the splitting algorithm to do the following:
- Specify 0 mappers when you want # mappers = # regions
- If you specify fewer mappers than regions, will use exactly the number you
specify based on the current algorithm
- If you specify more mappers than regions, will divide regions up by
determining [start,X) [X,end). The number of mappers will always be a multiple
of number of regions. This is so we do not have scanners spanning multiple
regions.
There is an additional issue in that the default number of mappers in JobConf
is set to 1. That means if a user does not explicitly set number of map tasks,
a single mapper will be used. I'm going to deal with that in a separate jira
as the issue currently exists, there are a number of ways to implement this,
and it's not required to complete this issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.