[
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628635#action_12628635
]
Devaraj Das commented on HADOOP-3150:
-------------------------------------
*sigh* there is a problem with the 3150 patch I committed - the
FairScheduler.assignTasks has not been modified to take into account the
cleanupTask (the task that cleans up the job's temp data, previously used to be
done by the JT). I have some thoughts/questions in mind regarding that (that
ideally should have been discussed prior to commit)...
Stepping back on cleanupTask, the way it is implemented in this patch is that
it could be a map or a reduce task and after the job completes running its
regular map/reduce tasks, the cleanup task is run on the first free slot. That
is, when a TT comes asking for a task, all the jobs are looked at as to whether
the job is ready to have its cleanup task run (look at
JobQueueTaskScheduler.assignTasks and JobInProgress.obtainCleanupTask). If so,
depending on the type of task we want to assign to the TT, we either give out a
map or a reduce cleanup task. Once a cleanup TIP successfully completes we
simply kill the other cleanup TIP and mark the job complete.
(Pros of the current approach)
Now the reason for treating the cleanup task as the *highest priority* one
among all tasks from all jobs is to enable removal of the (essentially
complete) job state from the JT's memory as soon as possible. Since the
cleanupTask is user code, the job's success/failure is also dependent on this
task's success/failure respectively. This approach makes the runtimes of jobs
more consistent (especially relevant when we are running benchmark kind of
jobs).
(Cons of the current approach)
All schedulers have to be aware of cleanupTask in their assignTasks method.
One afterthought which comes to my mind after having committed the patch is
whether we should give such a special treatment to the cleanup tasks or make it
part of the job's regular map/reduce TIPs (that is we inject additional TIPs,
one of type Map and another of type Reduce, into the original map/reduce TIPs)
so that JobInProgress.obtainNewMapTask/ReduceTask can handle this as special
cases (they hand out a cleanup task when the regular tasks of both types are
complete). This way schedulers don't have to be bothered about the cleanup
task.
Depending on the discussion, we can either revert the patch and implement the
alternative, or raise a jira for addressing just the FairScheduler.assignTasks.
> Move task file promotion into the task
> --------------------------------------
>
> Key: HADOOP-3150
> URL: https://issues.apache.org/jira/browse/HADOOP-3150
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.0
>
> Attachments: 3150.patch, patch-3150.txt, patch-3150.txt,
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt,
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt,
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt,
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt,
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and
> move it down into the output format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.