[ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974306#comment-13974306
 ] 

Jean-Daniel Cryans commented on HBASE-10932:
--------------------------------------------

Hey [~carp84], I forgot about this issue, let me address your latest replies.

bq. I thought it's designed for purpose to make each mapper just scan one 
single region

That's more an implementation detail than a design, and we can further improve 
the implementation by giving more control to the power users.

bq. This is useful especially in multi-tenant env, when we need to check data 
integrity for one user after data importing meanwhile don't want the scan 
burden to slow down RT of other users' request.

Right, but again, resource management is a broader issue. I doubt that 
RowCounter is the only job that needs to be throttled, what about 
VerifyReplication? Or Export? Those jobs usually aren't latency sensitive and 
can run in the background. This can be simply handled by a correctly configured 
job scheduler, that's what they do.

> Improve RowCounter to allow mapper number set/control
> -----------------------------------------------------
>
>                 Key: HBASE-10932
>                 URL: https://issues.apache.org/jira/browse/HBASE-10932
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>         Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to