[
https://issues.apache.org/jira/browse/MAPREDUCE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784046#action_12784046
]
Hong Tang commented on MAPREDUCE-1229:
--------------------------------------
Attached new patch that addresses the comments by Dick.
bq. 1: Should TestSimulator*JobSubmission check to see whether the total
"runtime" was reasonable for the Policy?
Currently, each policy is tested as a separate test case. It may be hard to
combine them and compare the virtual runtime, which is only present as console
output. I did do some basic sanity check manually after the run.
bq. 2: minor nit: Should SimulatorJobSubmissionPolicy/getPolicy(Configuration)
use valueOf(policy.toUpper()) instead of looping through the types?
Updated in the patch based on the suggestion.
bq. 3: medium sized nit: in SimulatorJobClient.isOverloaded() there are two
literals, 0.9 and 2.0F, that ought to be static private named values.
Added final variables to represent the magic constants, and added comments.
bq. 4: Here is my biggest point. The existing code cannot submit a job more
often than once every five seconds when the jobs were spaced further apart than
that and the policy is STRESS .
bq.
bq. Please consider adding code to call the processLoadProbingEvent core code
when we processJobCompleteEvent or a processJobSubmitEvent . That includes
potentially adding a new LoadProbingEvent . This can lead to an accumulation
because each LoadProbingEvent replaces itself, so we should track the ones that
are in flight in a PriorityQueue and only add a new LoadProbingEvent whenever
the new event has a time stamp strictly earlier than the earliest one already
in flight. This will limit us to two events in flight with the current
adjustLoadProbingInterval .
bq.
bq. If you don't do that, then if a real dreadnaught of a job gets dropped into
the system and the probing interval gets long it could take us a while to
notice that we're okay to submit jobs, in the case where the job has many tasks
finishing at about the same time, and we could submit tiny jobs as onsies every
five seconds when the cluster is clear enough to accommodate lots of jobs. When
the cluster can handle N jobs in less than 5N seconds for some N, we won't
overload it with the existing code.
I changed the minimum load probing interval to 1 seconds (from 5 seconds). Note
that when a job is submitted, it could take a few seconds before JT assigns the
map tasks to TTs with free map slots. So reducing this interval further could
lead to artificial load spikes.
I also added load checks after each job completion, and if the cluster is
underloaded, we submit another job (and reset the load checking interval to the
minimum value). This does bring in a potential danger when many jobs happen to
complete at the same time, and inject a lot of jobs into the system. But I
think such risk should be fairly low and thus would not worry much about it.
> [Mumak] Allow customization of job submission policy
> ----------------------------------------------------
>
> Key: MAPREDUCE-1229
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1229
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/mumak
> Affects Versions: 0.21.0, 0.22.0
> Reporter: Hong Tang
> Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1229-20091121.patch,
> mapreduce-1229-20091123.patch, mapreduce-1229-20091130.patch
>
>
> Currently, mumak replay job submission faithfully. To make mumak useful for
> evaluation purposes, it would be great if we can support other job submission
> policies such as sequential job submission, or stress job submission.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.