[jira] [Commented] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region

Mikhail Antonov (JIRA) Thu, 25 May 2017 13:44:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025363#comment-16025363
 ]


Mikhail Antonov commented on HBASE-18090:
-----------------------------------------

Oh, thanks for reference! I didn't see this one. I don't see patches there, so 
might be this one would do some good.

any feedback on the patch? I think assuming reasonably even key distribution 
across regions, giving just number of splits per region and split algo should 
suffice. Simpler and cheaper then trying to compute actual distribution based 
on data in HFiles.

Still need to address feedback from [~tedyu], as well as some rough edges 
around how we create recovered.edits files during openRegion sequence.

cc [~enis] [~esteban] ?

> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. 
> This places unnecessary restriction that the region layout of the original 
> table needs to take the processing resources available to MR job into 
> consideration. Allowing to run multiple mappers per region (assuming 
> reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region

Reply via email to