[ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xinxin fan updated HBASE-18090: ------------------------------- Release Note: In this task, we make it possible to run multiple mappers per region in the table snapshot. The following code is primary table snapshot mapper initializatio: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into ); The job only run one map task per region in the table snapshot. With this feature, client can specify the desired num of mappers when init table snapshot mapper job: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into splitAlgorithm, // splitAlgo algorithm to split, current split algorithms only support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit() n // how many input splits to generate per one region ); was: In this task, we make it possible to run multiple mappers per region in the table snapshot. The following code is primary table snapshot mapper initializatio: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into ); The job only run one map task per region in the table snapshot. With this feature, client can specify the desired num of mappers when init table snapshot mapper job: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into splitAlgorithm, // splitAlgo algorithm to split, current split algorithms only support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit() n // how many input splits to generate per one region ); > Improve TableSnapshotInputFormat to allow more multiple mappers per region > -------------------------------------------------------------------------- > > Key: HBASE-18090 > URL: https://issues.apache.org/jira/browse/HBASE-18090 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Mikhail Antonov > Assignee: xinxin fan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18090-V3-master.patch, > HBASE-18090-V4-master.patch, HBASE-18090-V5-master.patch, > HBASE-18090-branch-1-v2.patch, HBASE-18090-branch-1-v2.patch, > HBASE-18090-branch-1.3-v1.patch, HBASE-18090-branch-1.3-v2.patch, > HBASE-18090.branch-1.patch > > > TableSnapshotInputFormat runs one map task per region in the table snapshot. > This places unnecessary restriction that the region layout of the original > table needs to take the processing resources available to MR job into > consideration. Allowing to run multiple mappers per region (assuming > reasonably even key distribution) would be useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)