[
https://issues.apache.org/jira/browse/HADOOP-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647491#action_12647491
]
Runping Qi commented on HADOOP-4627:
------------------------------------
Actually, in real life, it is very often for a lot of jobs to use some common
hot data.
Again, I don't think there exists such thing as the best benchmark load.
Actually, the gridmix as a benchmark load is very different than the real load
of a real cluster.
As I said, the purpose for us to use gridmix is to track the impacts of various
changes over time and to uncover performance related problems.
So far, it has served us pretty well.
We want the load nature of the gridmix remains to be the same so that we can
compare the running results over time.
If you change how a small/medium job gets its input, then it changes the
gridmix load nature. It makes it is impossible to compare new runs with
baselines.
Having said that, we know that the gridmix version one is not flexible enough
and the job submission not efficient.
Then we create this version two. If you'd like to improve it, version two
should be a good start.
> gridmix version 2
> -----------------
>
> Key: HADOOP-4627
> URL: https://issues.apache.org/jira/browse/HADOOP-4627
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Runping Qi
> Attachments: H-4627.txt
>
>
> The new gridmix differs from the original gridmix in the following ways:
> 1. Use an xml config file to specify the types and sizes mix of a mix load.
> This provides better granularity control.
> 2. Use JobControl to submit gridmix load, instead of shell script.
> 3. Include Pig jobs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.