[ 
https://issues.apache.org/jira/browse/HADOOP-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647489#action_12647489
 ] 

Matei Zaharia commented on HADOOP-4627:
---------------------------------------

I don't think some of the issues I brought up are just about job mix. For 
example, having all small jobs use the same 3 files means that data locality 
for these jobs will be poor and any results from them won't be meaningful for a 
Hadoop cluster running small jobs on many different small files. Similarly, 
with smaller data sizes, the bottleneck in gridmix can be job submission, so 
the benchmark won't tell much about performance in a real cluster. Is the right 
thing to do to open separate JIRAs for these issues?

> gridmix version 2
> -----------------
>
>                 Key: HADOOP-4627
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4627
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Runping Qi
>         Attachments: H-4627.txt
>
>
> The new gridmix differs from the original gridmix in the following ways:
> 1. Use an xml config file to specify the types and sizes mix of a mix load. 
> This provides better granularity control.
> 2. Use JobControl to submit gridmix load, instead of shell script.
> 3. Include Pig jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to