[ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963167#comment-13963167 ]
Jean-Daniel Cryans commented on HBASE-10932: -------------------------------------------- It's probably a bug that TableInputFormatBase doesn't do it, looking at the old one (in org.apache.hadoop.hbase.mapred) you can see that it does this: {code} * Splits are created in number equal to the smallest between numSplits and * the number of {@link HRegion}s in the table. If the number of splits is * smaller than the number of {@link HRegion}s then splits are spanned across * multiple {@link HRegion}s and are grouped the most evenly possible. In the * case splits are uneven the bigger splits are placed first in the * {@link InputSplit} array. {code} And you don't need a new parameter. > Improve RowCounter to allow mapper number set/control > ----------------------------------------------------- > > Key: HBASE-10932 > URL: https://issues.apache.org/jira/browse/HBASE-10932 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Yu Li > Assignee: Yu Li > Priority: Minor > Attachments: HBASE-10932_v1.patch > > > The typical use case of RowCounter is to do some kind of data integrity > checking, like after exporting some data from RDBMS to HBase, or from one > HBase cluster to another, making sure the row(record) number matches. Such > check commonly won't require much on response time. > Meanwhile, based on current impl, RowCounter will launch one mapper per > region, and each mapper will send one scan request. Assuming the table is > kind of big like having tens of regions, and the cpu core number of the whole > MR cluster is also enough, the parallel scan requests sent by mapper would be > a real burden for the HBase cluster. > So in this JIRA, we're proposing to make rowcounter support an additional > option "--maps" to specify mapper number, and make each mapper able to scan > more than one region of the target table. -- This message was sent by Atlassian JIRA (v6.2#6252)