[ 
https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038521#comment-16038521
 ] 

Mikhail Antonov commented on HBASE-18090:
-----------------------------------------

Thanks [[email protected]] and [~easyliangjob] for reviews! I'll address them 
shortly.

I've made my patch off branch-1.3 so not sure why you couldn't apply it 
locally. Merge conflicts? 

I found an issue with current patch, if we try to open a region from several 
tasks we're hitting a race in this code:

{code}
        at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1154)
        at 
org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:740)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:876)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:802)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6708)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6669)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6640)
        at 
org.apache.hadoop.hbase.client.ClientSideRegionScanner.<init>(ClientSideRegionScanner.java:60)
{code}

Why do we need to go through the code path if we know region is in read-only 
mode?



> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. 
> This places unnecessary restriction that the region layout of the original 
> table needs to take the processing resources available to MR job into 
> consideration. Allowing to run multiple mappers per region (assuming 
> reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to