[jira] Commented: (MAPREDUCE-2138) Gridmix tests with different time interval mr traces (1min, 3min and 5min).

Ranjit Mathew (JIRA) Thu, 04 Nov 2010 03:44:09 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928175#action_12928175
 ]


Ranjit Mathew commented on MAPREDUCE-2138:
------------------------------------------

Thanks for doing this. Some comments:
* Since Load v/s Sleep, Submitter v/s RoundRobin v/s Echo user-resolvers, 
Stress v/s Replay v/s Serial are almost independent options, we would ideally 
need test-cases for all possible permutations of these. To keep things 
reasonable though, we should at least have {{LoadJob}} run in each of Stress, 
Replay and Serial modes. For testing the user-resolvers, we can make do with 
{{SleepJob}} running with each of Submitter, RoundRobin and Echo 
user-resolvers. Add in a couple of extra test-cases for traces with different 
times (1 v/s 3 v/s 5 minutes) and we're talking of having _at least_ eight 
different test-cases for a modestly-reasonable test-suite for GridMix3.
* I suggest changing {{GridmixJobStory}}, etc. to have names with, for example, 
{{Test}} as a prefix so that they do not clash with legitimate classes in the 
{{org.apache.hadoop.mapred.gridmix}} name-space that might be developed in the 
future.
* In {{GridmixJobStory}}, {{jobstories}} and {{zombieJobs}} seem to be the 
_same_ map but with different interfaces to the value. Since {{ZombieJob}} 
implements {{JobStory}}, can values with the latter interface not suffice? 
(Also, technically {{buildJobStories()}} can return a {{null}} map, so the 
callers should guard against this condition.)
* There should be some class-description for {{GridmixJobStory}}. Also, 
{{GridmixJobVerification}} looks like a very awkward class that should perhaps 
be subsumed as methods elsewhere. Ditto for {{GridmixJobStory}} in fact - why 
does this class need to exist, especially since 
{{UtilsForGridmix.getJobStories()}} seems to do the same thing?
* Need better JavaDoc comments for {{UtilsForGridmix.listGridmixJobIDs()}}. 
There is also no input parameter named {{jobStatus}} for the method. In this 
method, you can also keep the value of {{client.getAllJobs()}} around instead 
of calling it in each iteration.
* In {{UtilsForGridmix.listGridmixOriginalJobIDs()}}, instead of using the 
job-name to figure out the original job's id, you can use the appropriate 
configuration property (MAPREDUCE-2137). Also, instead of having two separate 
methods to get the current and original job identifiers for GridMix3 jobs, you 
can either have a map or a list of simple objects (POJOs).

Enough comments for now, I guess. ;-)

> Gridmix tests with different time interval mr traces (1min, 3min and 5min).
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2138
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2138
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: test
>            Reporter: Vinay Kumar Thota
>            Assignee: Vinay Kumar Thota
>         Attachments: MAPREDUCE-2138.patch
>
>
> 1. Generate input data based on cluster size and create the synthetic jobs by 
> using the 1 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = SubmitterUserResolver
> GRIDMIX_SUBMISSION_POLICY = STRESS
> Input Size = 400 MB * No. of nodes in cluster.
> TRACE_FILE = 1 min folded trace.
> Verify each job status and summary(QueueName, UserName, StatTime, FinishTime, 
> maps, reducers and counters etc) after
> completion of execution.
> 2. Generate input data based on cluster size and create the synthetic jobs by 
> using the 3 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = RoundRobinUserResolver
> GRIDMIX_SUBMISSION_POLICY = Replay
> Input Size = 200 MB * No. of nodes in cluster.
> TRACE_FILE = 3 min folded trace.
> PROXY_USERS = proxy users file path.
> Verify each job status, submitted user and summary(QueueName, UserName, 
> StatTime, FinishTime, maps, reducers and
> counters etc) after completion of execution.
> 3. Generate input data based on cluster size and create the synthetic jobs by 
> using the 5 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = SleepJob
> GRIDMIX_USER_RESOLVER = EchoUserResolver
> GRIDMIX_MIN_FILE = 100 MB
> GRIDMIX_SUBMISSION_POLICY = Serial
> Input Size = 300 MB * No. of nodes in cluster.
> TRACE_FILE = 5 min folded trace.
> Verify each job status, file size and summary(QueueName, UserName, StatTime, 
> FinishTime, maps, reducers and counters
> etc) after completion of execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2138) Gridmix tests with different time interval mr traces (1min, 3min and 5min).

Reply via email to