Should we move out the creation of setup/cleanup tasks from 
JobInProgress.initTasks()? 
---------------------------------------------------------------------------------------

                 Key: HADOOP-4472
                 URL: https://issues.apache.org/jira/browse/HADOOP-4472
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Vivek Ratan


JobInProgress.initTasks() creates TIPs for map and reduce tasks, and also the 
newly-introduced setup and cleanup tasks. initTasks() is called by the 
schedulers, as for reasons of memory optimizations, schedulers may choose to 
initialize M/R tasks at various moments (the Capacity Scheduler, for example, 
calls initTasks() just when it considers a job for running). One can say that 
Schedulers 'own' the initialization of M/R tasks in a job. Furthermore the JT 
'owns' the setup and cleanup tasks (it schedules them, and Schedulers are 
unaware of these tasks). This causes a problematic dependency between the JT 
and a Scheduler. For example, the Capacity Scheduler calls initTasks() and 
immediately calls JobInProgress.obtainNewMapTask for a map task. This is a 
problem today, because we cannot run any map or reduce tasks before the setup 
task is run, which the Capacity Scheduler is not aware of. 

Either all Schedulers are explicitly aware of setup/cleanup tasks and their 
dependencies with M/R tasks (in which case, Schedulers 'own' the creation and 
scheduling of all these tasks correctly), or the JT 'owns' the setup/cleanup 
tasks and Schedulers are completely unaware of them (in which case, the 
creation of setup/cleanup tasks must be moved out of initTasks into a separate 
method which is called by the JT). 

I think the latter is the right way to go (unless we implement HADOOP-4421, in 
which case the former option may be viable as well). 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to