[
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963167#comment-13963167
]
Jean-Daniel Cryans commented on HBASE-10932:
--------------------------------------------
It's probably a bug that TableInputFormatBase doesn't do it, looking at the old
one (in org.apache.hadoop.hbase.mapred) you can see that it does this:
{code}
* Splits are created in number equal to the smallest between numSplits and
* the number of {@link HRegion}s in the table. If the number of splits is
* smaller than the number of {@link HRegion}s then splits are spanned across
* multiple {@link HRegion}s and are grouped the most evenly possible. In the
* case splits are uneven the bigger splits are placed first in the
* {@link InputSplit} array.
{code}
And you don't need a new parameter.
> Improve RowCounter to allow mapper number set/control
> -----------------------------------------------------
>
> Key: HBASE-10932
> URL: https://issues.apache.org/jira/browse/HBASE-10932
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Yu Li
> Assignee: Yu Li
> Priority: Minor
> Attachments: HBASE-10932_v1.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity
> checking, like after exporting some data from RDBMS to HBase, or from one
> HBase cluster to another, making sure the row(record) number matches. Such
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per
> region, and each mapper will send one scan request. Assuming the table is
> kind of big like having tens of regions, and the cpu core number of the whole
> MR cluster is also enough, the parallel scan requests sent by mapper would be
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional
> option "--maps" to specify mapper number, and make each mapper able to scan
> more than one region of the target table.
--
This message was sent by Atlassian JIRA
(v6.2#6252)