[ 
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628635#action_12628635
 ] 

Devaraj Das commented on HADOOP-3150:
-------------------------------------

*sigh* there is a problem with the 3150 patch I committed - the 
FairScheduler.assignTasks has not been modified to take into account the 
cleanupTask (the task that cleans up the job's temp data, previously used to be 
done by the JT). I have some thoughts/questions in mind regarding that (that 
ideally should have been discussed prior to commit)...

Stepping back on cleanupTask, the way it is implemented in this patch is that 
it could be a map or a reduce task and after the job completes running its 
regular map/reduce tasks, the cleanup task is run on the first free slot. That 
is, when a TT comes asking for a task, all the jobs are looked at as to whether 
the job is ready to have its cleanup task run (look at 
JobQueueTaskScheduler.assignTasks and JobInProgress.obtainCleanupTask). If so, 
depending on the type of task we want to assign to the TT, we either give out a 
map or a reduce cleanup task. Once a cleanup TIP successfully completes we 
simply kill the other cleanup TIP and mark the job complete.

(Pros of the current approach)
Now the reason for treating the cleanup task as the *highest priority* one 
among all tasks from all jobs is to enable removal of the (essentially 
complete) job state from the JT's memory as soon as possible. Since the 
cleanupTask is user code, the job's success/failure is also dependent on this 
task's success/failure respectively. This approach makes the runtimes of jobs 
more consistent (especially relevant when we are running benchmark kind of 
jobs). 

(Cons of the current approach)
All schedulers have to be aware of cleanupTask in their assignTasks method.

One afterthought which comes to my mind after having committed the patch is 
whether we should give such a special treatment to the cleanup tasks or make it 
part of the job's regular map/reduce TIPs (that is we inject additional TIPs, 
one of type Map and another of type Reduce, into the original map/reduce TIPs) 
so that JobInProgress.obtainNewMapTask/ReduceTask can handle this as special 
cases (they hand out a cleanup task when the regular tasks of both types are 
complete). This way schedulers don't have to be bothered about the cleanup 
task. 

Depending on the discussion, we can either revert the patch and implement the 
alternative, or raise a jira for addressing just the FairScheduler.assignTasks.

> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch, patch-3150.txt, patch-3150.txt, 
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, 
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, 
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, 
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, 
> patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and 
> move it down into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to