[ 
https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020265#comment-16020265
 ] 

Ted Yu commented on HBASE-18090:
--------------------------------

For the new TableMapReduceUtil#initTableSnapshotMapJob method (in mapred 
package), please add numSplitsPerRegion to @param
{code}
+    } else if 
(RegionSplitter.HexStringSplit.class.getSimpleName().equals(conf.get(SPLIT_ALGO)))
 {
+      splitAlgo = new RegionSplitter.HexStringSplit();
+    }
{code}
Add an else block for handling the case where split algorithm is not specified.
{code}
+    if (splitAlgo == null && numSplitsPerRegion > 1) {
+      throw new IllegalArgumentException("Split algo can't be null, numSplits 
must be >= 1!");
{code}
The condition seems to imply that numSplits can be 1 if splitAlgo is null. 
Please modify the error message to be more precise.

> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. 
> This places unnecessary restriction that the region layout of the original 
> table needs to take the processing resources available to MR job into 
> consideration. Allowing to run multiple mappers per region (assuming 
> reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to