[ 
https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025520#comment-16025520
 ] 

Mikhail Antonov commented on HBASE-18090:
-----------------------------------------

I actually just want to decouple two things: 1) decision making guiding table 
split layout on the cluster running HBase and serving traffic to applications 
and 2) amount of parallelism available in the batch processing.

So it's not necessarily about knowing the size of regions, its about knowing 
that if number of regions in snapshot is X and number of mapreduce slots is, 
for example, 5x, then I can run 5 tasks per region.

> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. 
> This places unnecessary restriction that the region layout of the original 
> table needs to take the processing resources available to MR job into 
> consideration. Allowing to run multiple mappers per region (assuming 
> reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to