[
https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765781#comment-15765781
]
Yi Liang commented on HBASE-5401:
---------------------------------
I have used this command and also encounter this issue, for example:
when I run hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=m
randomWrite n
if we use --nomapred, this will create n threads(clients) and each thread write
m/n rows into hbase
if we use default mapreduce, this will create 10*n mappers, and each mapper
will put m/(n*10) rows into hbase.
I think the static int {code}static int TASKS_PER_CLIENT = 10{code} here is
unnecessary,
1. If user want more mappers they can just change client numbers, however,
if *10 is here, user can only create 10, 20, 30... mappers for different number
of client, this is not flexible.
2. The TASKS_PER_CLIENT = 10 is hardcoded and invisible to user, sometime
may be user just want 5 mappers for their job, and current code will create 50
mappers.
3. when <nclients> = 5, it means 5 threads and 50 mappers, which is a little
inconsistent, PS. I do not mean mapper is same as thread, but it is better to
keep them same.
What do you guys think?
> PerformanceEvaluation generates 10x the number of expected mappers
> ------------------------------------------------------------------
>
> Key: HBASE-5401
> URL: https://issues.apache.org/jira/browse/HBASE-5401
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0
> Reporter: Oliver Meyn
> Fix For: 2.0.0
>
> Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation
> randomWrite 10' there are 100 mappers spawned, rather than the expected 10.
> The culprit appears to be the outer loop in writeInputFile which sets up 10
> splits for every "asked-for client". I think the fix is just to remove that
> outer loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)