Modify TableInputFormat splitting algorithm to allow any number of mappers
--------------------------------------------------------------------------

                 Key: HBASE-1172
                 URL: https://issues.apache.org/jira/browse/HBASE-1172
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: mapred
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.19.1, 0.20.0


Currently, the number of mappers specified when using TableInputFormat is 
strictly followed if less than total regions on the input table.  If greater, 
the number of regions is used.

This will modify the splitting algorithm to do the following:

- Specify 0 mappers when you want # mappers = # regions
- If you specify fewer mappers than regions, will use exactly the number you 
specify based on the current algorithm
- If you specify more mappers than regions, will divide regions up by 
determining [start,X) [X,end).  The number of mappers will always be a multiple 
of number of regions.  This is so we do not have scanners spanning multiple 
regions.

There is an additional issue in that the default number of mappers in JobConf 
is set to 1.  That means if a user does not explicitly set number of map tasks, 
a single mapper will be used.  I'm going to deal with that in a separate jira 
as the issue currently exists, there are a number of ways to implement this, 
and it's not required to complete this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to