[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

Yu Li (JIRA) Thu, 24 Apr 2014 19:43:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980642#comment-13980642
 ]


Yu Li commented on HBASE-10932:
-------------------------------

Hi [~jdcryans],

Sorry I forgot about this issue also...

{quote}
I doubt that RowCounter is the only job that needs to be throttled, what about 
VerifyReplication? Or Export?
This can be simply handled by a correctly configured job scheduler, that's what 
they do.
{quote}
I see, so you're suggesting to find a more generous solution for such control. 
But I didn't quite catch the point of job scheduler, in my understanding job 
scheduler is cluster-level and cannot be configured per-job, right? If so, I'm 
not sure whether we can change the scheduling policy just for hbase, since 
commonly lots of other kinds of jobs will be running in the MR/Yarn cluster and 
the hbase jobs is only a small portion
Anyway, this is an interesting topic and will spend some more time thinking 
about the VerifyReplication/Export cases and the general solution

> Improve RowCounter to allow mapper number set/control
> -----------------------------------------------------
>
>                 Key: HBASE-10932
>                 URL: https://issues.apache.org/jira/browse/HBASE-10932
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>         Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

Reply via email to