[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784046#action_12784046
 ] 

Hong Tang commented on MAPREDUCE-1229:
--------------------------------------

Attached new patch that addresses the comments by Dick.

bq. 1: Should TestSimulator*JobSubmission check to see whether the total 
"runtime" was reasonable for the Policy?
Currently, each policy is tested as a separate test case. It may be hard to 
combine them and compare the virtual runtime, which is only present as console 
output. I did do some basic sanity check manually after the run.

bq. 2: minor nit: Should SimulatorJobSubmissionPolicy/getPolicy(Configuration) 
use valueOf(policy.toUpper()) instead of looping through the types?
Updated in the patch based on the suggestion.

bq. 3: medium sized nit: in SimulatorJobClient.isOverloaded() there are two 
literals, 0.9 and 2.0F, that ought to be static private named values.
Added final variables to represent the magic constants, and added comments.

bq. 4: Here is my biggest point. The existing code cannot submit a job more 
often than once every five seconds when the jobs were spaced further apart than 
that and the policy is STRESS .
bq. 
bq. Please consider adding code to call the processLoadProbingEvent core code 
when we processJobCompleteEvent or a processJobSubmitEvent . That includes 
potentially adding a new LoadProbingEvent . This can lead to an accumulation 
because each LoadProbingEvent replaces itself, so we should track the ones that 
are in flight in a PriorityQueue and only add a new LoadProbingEvent whenever 
the new event has a time stamp strictly earlier than the earliest one already 
in flight. This will limit us to two events in flight with the current 
adjustLoadProbingInterval .
bq. 
bq. If you don't do that, then if a real dreadnaught of a job gets dropped into 
the system and the probing interval gets long it could take us a while to 
notice that we're okay to submit jobs, in the case where the job has many tasks 
finishing at about the same time, and we could submit tiny jobs as onsies every 
five seconds when the cluster is clear enough to accommodate lots of jobs. When 
the cluster can handle N jobs in less than 5N seconds for some N, we won't 
overload it with the existing code.
I changed the minimum load probing interval to 1 seconds (from 5 seconds). Note 
that when a job is submitted, it could take a few seconds before JT assigns the 
map tasks to TTs with free map slots. So reducing this interval further could 
lead to artificial load spikes.

I also added load checks after each job completion, and if the cluster is 
underloaded, we submit another job (and reset the load checking interval to the 
minimum value). This does bring in a potential danger when many jobs happen to 
complete at the same time, and inject a lot of jobs into the system. But I 
think such risk should be fairly low and thus would not worry much about it.

> [Mumak] Allow customization of job submission policy
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-1229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/mumak
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: mapreduce-1229-20091121.patch, 
> mapreduce-1229-20091123.patch, mapreduce-1229-20091130.patch
>
>
> Currently, mumak replay job submission faithfully. To make mumak useful for 
> evaluation purposes, it would be great if we can support other job submission 
> policies such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to