[ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964370#comment-13964370 ]
haosdent commented on HBASE-10932: ---------------------------------- {quote} the configuration parameter is still called "maps". {quote} "scanner.num" maybe better. {quote} Let's say you use this new "maps" configuration and set to 20. {quote} If I am a user, maybe I would set to 2 or other lower value here. Anyway, I think this issue is an useful issue. Because of have some import online businesses in my clusters, any unnecessary heavy IO could unacceptable. [~jdcryans] focus on code style while [~carp84] focus on how to handle this scenario and make the number of mappers configurable. Maybe we need a consensus about choose which way to workaround this issue here. Just my opinions. > Improve RowCounter to allow mapper number set/control > ----------------------------------------------------- > > Key: HBASE-10932 > URL: https://issues.apache.org/jira/browse/HBASE-10932 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Yu Li > Assignee: Yu Li > Priority: Minor > Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch > > > The typical use case of RowCounter is to do some kind of data integrity > checking, like after exporting some data from RDBMS to HBase, or from one > HBase cluster to another, making sure the row(record) number matches. Such > check commonly won't require much on response time. > Meanwhile, based on current impl, RowCounter will launch one mapper per > region, and each mapper will send one scan request. Assuming the table is > kind of big like having tens of regions, and the cpu core number of the whole > MR cluster is also enough, the parallel scan requests sent by mapper would be > a real burden for the HBase cluster. > So in this JIRA, we're proposing to make rowcounter support an additional > option "--maps" to specify mapper number, and make each mapper able to scan > more than one region of the target table. -- This message was sent by Atlassian JIRA (v6.2#6252)