[ http://issues.apache.org/jira/browse/HADOOP-428?page=comments#action_12429160 ] Kimoon Kim commented on HADOOP-428: -----------------------------------
Found a 2005 paper where the authors modified condor_startd so that node ClassAd includes the list of SRM input files available on an execute node. (see http://sdm.lbl.gov/~arie/papers/CoScheduling.SSDBM05.pdf) This allows a task to be scheduled to a node that has an input file, similar to how a map task of hadoop gets scheduled. Authors claim the modification of condor_startd was not so hard to them. Key optimization was not to import all files, but to only import files for current job(s) so that Condor matchmaking avoids scale barrier. > Condor and Hadoop Map Reduce integration > ---------------------------------------- > > Key: HADOOP-428 > URL: http://issues.apache.org/jira/browse/HADOOP-428 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Devaraj Das > Assigned To: Devaraj Das > > The issue is about using/enhancing Condor's features for Hadoop's Map Reduce > framework. Some of the early thoughts in this respect: > * One should be able to submit a MR job that takes advantage of Condor's > features like node reservation according to a job's requirements, monitoring > of jobs, etc. > * JobTracker and TaskTrackers work as Master/Workers in the Condor > environment. One should be able to simply start a MR cluster and the cluster > goes down when the job is done. > * The classads can have an attribute for input file block locations that will > be an input to Condor's scheduling decisions. > * Condor's features of monitoring jobs can be leveraged to reschedule failed > TaskTrackers. Checkpointing of JobTrackers can also probably be done so that > if the JobTracker job dies for some reason, the failed jobs can be restarted > to start from the point where the JobTracker was last checkpointed at > (assuming the input data has not changed). > * User priorities, job priorities should also be handled. If nodes are > currently in use due to a job being run by one user, and another user of the > same priority submits a new job, it gets queued and opportunistically the job > of the second user is scheduled - for e.g., one master and 1 worker to start > with and then 2 workers and so on... If the second user is of a higher > priority, then the first user's job is completely suspended. > Please add your thoughts on this topic. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
