[ https://issues.apache.org/jira/browse/HADOOP-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465614 ]
Doug Cutting commented on HADOOP-719: ------------------------------------- > Would this be a good candidate to check into hadoop contrib? It sounds useful to me. I can't see why not. Contrib sounds appropriate. > Integration of Hadoop with batch schedulers > ------------------------------------------- > > Key: HADOOP-719 > URL: https://issues.apache.org/jira/browse/HADOOP-719 > Project: Hadoop > Issue Type: New Feature > Components: contrib/streaming > Reporter: Mahadev konar > Assigned To: Mahadev konar > > Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like > Condor/torque/sun grid etc. Hadoop On Demand or HOD hereafter is a system > that populates a Hadoop instance using a shared batch scheduler. HOD will > find a requested number of nodes and start up Hadoop daemons on them. Users > map reduce jobs can then run on the hadoop instance. After the job is done, > HOD gives back the nodes to the shared batch scheduler. A group of users > will use HOD to acquire Hadoop instances of varying sizes and the batch > scheduler will schedule requests in a way that important jobs gain more > importance/resources and finish fast. Here are a list of requirements for HOD > and batch schedulers: > Key Requirements : > --- Should allocate the specified minimum number of nodes for a job > Many batch jobs can finish in time, only when enough resources are > allocated. Therefore batch scheduler should allocate the asked number of > nodes for a given job when the job starts. This is simple form of what's > known as gang scheduling. > Often the minimum nodes are not available right away, especially if the job > asked for a large number. The batch scheduler should support advance > reservation for important jobs so that the wait time can be determined. In > advance reservation, a reservation is created on earliest future point when > the preoccupied nodes become available. When nodes are currently idle but > booked by future reservations, batch scheduler is ok to give them to other > jobs to increase system utilization, but only when doing so does not delay > existing reservations. > --- run short urgent job without costing too much loss to long job. > Especially, should not kill job tracker of long job. > Some jobs, mostly short ones, are time sensitive and need urgent treatment. > Often, large portion of cluster nodes will be occupied by long running jobs. > Batch scheduler should be able to preempt long jobs and run urgent jobs. > Then, urgent jobs will finish quickly and long jobs can re-gain the nodes > afterward. > When preemption happens, HOD should minimize the loss to long jobs. > Especially, it should not kill job tracker of long job. > --- be able to dial up, at run time, share of resources for more important > projects. > Viewed at high level, a given cluster is shared by multiple projects. A > project consists of a number of jobs submitted by a group of users.Batch > scheduler should allow important projects to have more resources. This should > be tunable at run time as what projects deem more important may change over > time. > --- prevent malicious abuse of the system. > A shared cluster environment can be put in jeopardy if malicious or > erroneous job code does: > -- hold unneeded resources for a long period > -- use privileges for unworthy work > Such abuse can easily cause under-utilization or starvation of other jobs. > Batch scheduler should allow setting up policies for preventing resource > abuse by: > -- limit privileges to legitimate uses asking for proper amount > -- throttle peak use of resources per player > -- monitor and reduce starvation > --- The behavior should be simple and predictable > When status of the system is queried, we should be able to determine what > factors caused it to reach current status and what could be the future > behavior with or without our tuning on the system. > --- be portable to major resource managers > HOD design should be portable so that in future we are able to plugin > other resource manager. > Some of the key requirements are implemented by the batch schedulers. The > others need to be implemented by HOD. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira