[jira] Commented: (HADOOP-428) Condor and Hadoop Map Reduce integration

Kimoon Kim (JIRA) Fri, 18 Aug 2006 17:26:15 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-428?page=comments#action_12429160 ] 
            
Kimoon Kim commented on HADOOP-428:
-----------------------------------


Found a 2005 paper where the authors modified condor_startd so that node 
ClassAd includes the list of SRM input files available on an execute node. (see 
http://sdm.lbl.gov/~arie/papers/CoScheduling.SSDBM05.pdf)  This allows a task 
to be scheduled to a node that has an input file, similar to how a map task of 
hadoop gets scheduled. Authors claim the modification of condor_startd was not 
so hard to them. Key optimization was not to import all files, but to only 
import files for current job(s) so that Condor matchmaking avoids scale 
barrier. 

> Condor and Hadoop Map Reduce integration
> ----------------------------------------
>
>                 Key: HADOOP-428
>                 URL: http://issues.apache.org/jira/browse/HADOOP-428
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>
> The issue is about using/enhancing Condor's features for Hadoop's Map Reduce 
> framework. Some of the early thoughts in this respect:
> * One should be able to submit a MR job that takes advantage of Condor's 
> features like node reservation according to a job's requirements, monitoring 
> of jobs, etc.
> * JobTracker and TaskTrackers work as Master/Workers in the Condor 
> environment. One should be able to simply start a MR cluster and the cluster 
> goes down when the job is done.
> * The classads can have an attribute for input file block locations that will 
> be an input to Condor's scheduling decisions.
> * Condor's features of monitoring jobs can be leveraged to reschedule failed 
> TaskTrackers. Checkpointing of JobTrackers can also probably be done so that 
> if the JobTracker job dies for some reason, the failed jobs can be restarted 
> to start from the point where the JobTracker was last checkpointed at 
> (assuming the input data has not changed).
> * User priorities, job priorities should also be handled. If nodes are 
> currently in use due to a job being run by one user, and another user of the 
> same priority submits a new job, it gets queued and opportunistically the job 
> of the second user is scheduled - for e.g., one master and 1 worker to start 
> with and then 2 workers and so on... If the second user is of a higher 
> priority, then the first user's job is completely suspended.
> Please add your thoughts on this topic.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-428) Condor and Hadoop Map Reduce integration

Reply via email to