[
https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020265#comment-16020265
]
Ted Yu commented on HBASE-18090:
--------------------------------
For the new TableMapReduceUtil#initTableSnapshotMapJob method (in mapred
package), please add numSplitsPerRegion to @param
{code}
+ } else if
(RegionSplitter.HexStringSplit.class.getSimpleName().equals(conf.get(SPLIT_ALGO)))
{
+ splitAlgo = new RegionSplitter.HexStringSplit();
+ }
{code}
Add an else block for handling the case where split algorithm is not specified.
{code}
+ if (splitAlgo == null && numSplitsPerRegion > 1) {
+ throw new IllegalArgumentException("Split algo can't be null, numSplits
must be >= 1!");
{code}
The condition seems to imply that numSplits can be 1 if splitAlgo is null.
Please modify the error message to be more precise.
> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
> Key: HBASE-18090
> URL: https://issues.apache.org/jira/browse/HBASE-18090
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 1.4.0
> Reporter: Mikhail Antonov
> Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot.
> This places unnecessary restriction that the region layout of the original
> table needs to take the processing resources available to MR job into
> consideration. Allowing to run multiple mappers per region (assuming
> reasonably even key distribution) would be useful.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)