[jira] Commented: (MAPREDUCE-2138) Gridmix tests with different time interval mr traces (1min, 3min and 5min).

Vinay Kumar Thota (JIRA) Thu, 18 Nov 2010 04:12:40 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933403#action_12933403
 ]


Vinay Kumar Thota commented on MAPREDUCE-2138:
----------------------------------------------

bq. Since Load v/s Sleep, Submitter v/s RoundRobin v/s Echo user-resolvers, 
Stress v/s Replay v/s Serial are almost independent options, we would ideally 
need test-cases for all possible permutations of these. To keep things 
reasonable though, we should at least have LoadJob run in each of Stress, 
Replay and Serial modes. For testing the user-resolvers, we can make do with 
SleepJob running with each of Submitter, RoundRobin and Echo user-resolvers. 
Add in a couple of extra test-cases for traces with different times (1 v/s 3 
v/s 5 minutes) and we're talking of having at least eight different test-cases 
for a modestly-reasonable test-suite for GridMix3.

I am covering all the scenarios which you said, but this ticket covers only 
above mentioned 3 scenarios and rest of the scenarios covered in different jira 
tickets.

bq. I suggest changing GridmixJobStory, etc. to have names with, for example, 
Test as a prefix so that they do not clash with legitimate classes in the 
org.apache.hadoop.mapred.gridmix name-space that might be developed in the 
future.

Agreed and done the changes accordingly.

bq. In GridmixJobStory, jobstories and zombieJobs seem to be the same map but 
with different interfaces to the value. Since ZombieJob implements JobStory, 
can values with the latter interface not suffice? (Also, technically 
buildJobStories() can return a null map, so the callers should guard against 
this condition.)

Removed the duplicate method in Utils class.

bq. There should be some class-description for GridmixJobStory. Also, 
GridmixJobVerification looks like a very awkward class that should perhaps be 
subsumed as methods elsewhere. Ditto for GridmixJobStory in fact - why does 
this class need to exist, especially since UtilsForGridmix.getJobStories() 
seems to do the same thing?
Done.
bq. Need better JavaDoc comments for UtilsForGridmix.listGridmixJobIDs(). There 
is also no input parameter named jobStatus for the method. In this method, you 
can also keep the value of client.getAllJobs() around instead of calling it in 
each iteration.
done.  

bq. In UtilsForGridmix.listGridmixOriginalJobIDs(), instead of using the 
job-name to figure out the original job's id, you can use the appropriate 
configuration property (MAPREDUCE-2137). Also, instead of having two separate 
methods to get the current and original job identifiers for GridMix3 jobs, you 
can either have a map or a list of simple objects (POJOs).

I need the job name because I want to exclude the gridmix input data genertor 
job.

Please check the new patch which address some of your comments.










> Gridmix tests with different time interval mr traces (1min, 3min and 5min).
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2138
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2138
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: test
>            Reporter: Vinay Kumar Thota
>            Assignee: Vinay Kumar Thota
>         Attachments: MAPREDUCE-2138.patch
>
>
> 1. Generate input data based on cluster size and create the synthetic jobs by 
> using the 1 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = SubmitterUserResolver
> GRIDMIX_SUBMISSION_POLICY = STRESS
> Input Size = 400 MB * No. of nodes in cluster.
> TRACE_FILE = 1 min folded trace.
> Verify each job status and summary(QueueName, UserName, StatTime, FinishTime, 
> maps, reducers and counters etc) after
> completion of execution.
> 2. Generate input data based on cluster size and create the synthetic jobs by 
> using the 3 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = RoundRobinUserResolver
> GRIDMIX_SUBMISSION_POLICY = Replay
> Input Size = 200 MB * No. of nodes in cluster.
> TRACE_FILE = 3 min folded trace.
> PROXY_USERS = proxy users file path.
> Verify each job status, submitted user and summary(QueueName, UserName, 
> StatTime, FinishTime, maps, reducers and
> counters etc) after completion of execution.
> 3. Generate input data based on cluster size and create the synthetic jobs by 
> using the 5 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = SleepJob
> GRIDMIX_USER_RESOLVER = EchoUserResolver
> GRIDMIX_MIN_FILE = 100 MB
> GRIDMIX_SUBMISSION_POLICY = Serial
> Input Size = 300 MB * No. of nodes in cluster.
> TRACE_FILE = 5 min folded trace.
> Verify each job status, file size and summary(QueueName, UserName, StatTime, 
> FinishTime, maps, reducers and counters
> etc) after completion of execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2138) Gridmix tests with different time interval mr traces (1min, 3min and 5min).

Reply via email to