[
https://issues.apache.org/jira/browse/MAPREDUCE-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933403#action_12933403
]
Vinay Kumar Thota commented on MAPREDUCE-2138:
----------------------------------------------
bq. Since Load v/s Sleep, Submitter v/s RoundRobin v/s Echo user-resolvers,
Stress v/s Replay v/s Serial are almost independent options, we would ideally
need test-cases for all possible permutations of these. To keep things
reasonable though, we should at least have LoadJob run in each of Stress,
Replay and Serial modes. For testing the user-resolvers, we can make do with
SleepJob running with each of Submitter, RoundRobin and Echo user-resolvers.
Add in a couple of extra test-cases for traces with different times (1 v/s 3
v/s 5 minutes) and we're talking of having at least eight different test-cases
for a modestly-reasonable test-suite for GridMix3.
I am covering all the scenarios which you said, but this ticket covers only
above mentioned 3 scenarios and rest of the scenarios covered in different jira
tickets.
bq. I suggest changing GridmixJobStory, etc. to have names with, for example,
Test as a prefix so that they do not clash with legitimate classes in the
org.apache.hadoop.mapred.gridmix name-space that might be developed in the
future.
Agreed and done the changes accordingly.
bq. In GridmixJobStory, jobstories and zombieJobs seem to be the same map but
with different interfaces to the value. Since ZombieJob implements JobStory,
can values with the latter interface not suffice? (Also, technically
buildJobStories() can return a null map, so the callers should guard against
this condition.)
Removed the duplicate method in Utils class.
bq. There should be some class-description for GridmixJobStory. Also,
GridmixJobVerification looks like a very awkward class that should perhaps be
subsumed as methods elsewhere. Ditto for GridmixJobStory in fact - why does
this class need to exist, especially since UtilsForGridmix.getJobStories()
seems to do the same thing?
Done.
bq. Need better JavaDoc comments for UtilsForGridmix.listGridmixJobIDs(). There
is also no input parameter named jobStatus for the method. In this method, you
can also keep the value of client.getAllJobs() around instead of calling it in
each iteration.
done.
bq. In UtilsForGridmix.listGridmixOriginalJobIDs(), instead of using the
job-name to figure out the original job's id, you can use the appropriate
configuration property (MAPREDUCE-2137). Also, instead of having two separate
methods to get the current and original job identifiers for GridMix3 jobs, you
can either have a map or a list of simple objects (POJOs).
I need the job name because I want to exclude the gridmix input data genertor
job.
Please check the new patch which address some of your comments.
> Gridmix tests with different time interval mr traces (1min, 3min and 5min).
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-2138
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2138
> Project: Hadoop Map/Reduce
> Issue Type: Task
> Components: test
> Reporter: Vinay Kumar Thota
> Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-2138.patch
>
>
> 1. Generate input data based on cluster size and create the synthetic jobs by
> using the 1 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = SubmitterUserResolver
> GRIDMIX_SUBMISSION_POLICY = STRESS
> Input Size = 400 MB * No. of nodes in cluster.
> TRACE_FILE = 1 min folded trace.
> Verify each job status and summary(QueueName, UserName, StatTime, FinishTime,
> maps, reducers and counters etc) after
> completion of execution.
> 2. Generate input data based on cluster size and create the synthetic jobs by
> using the 3 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = RoundRobinUserResolver
> GRIDMIX_SUBMISSION_POLICY = Replay
> Input Size = 200 MB * No. of nodes in cluster.
> TRACE_FILE = 3 min folded trace.
> PROXY_USERS = proxy users file path.
> Verify each job status, submitted user and summary(QueueName, UserName,
> StatTime, FinishTime, maps, reducers and
> counters etc) after completion of execution.
> 3. Generate input data based on cluster size and create the synthetic jobs by
> using the 5 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = SleepJob
> GRIDMIX_USER_RESOLVER = EchoUserResolver
> GRIDMIX_MIN_FILE = 100 MB
> GRIDMIX_SUBMISSION_POLICY = Serial
> Input Size = 300 MB * No. of nodes in cluster.
> TRACE_FILE = 5 min folded trace.
> Verify each job status, file size and summary(QueueName, UserName, StatTime,
> FinishTime, maps, reducers and counters
> etc) after completion of execution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.