Integration of Hadoop with batch schedulers
-------------------------------------------

                 Key: HADOOP-719
                 URL: http://issues.apache.org/jira/browse/HADOOP-719
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/streaming
            Reporter: Mahadev konar
         Assigned To: Mahadev konar


Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like 
Condor/torque/sun grid etc. Hadoop On Demand or HOD hereafter is a system that 
populates a Hadoop instance using a shared batch scheduler. HOD will find a 
requested number of nodes and start up Hadoop daemons on them. Users map reduce 
jobs can then run on the hadoop instance. After the job is done, HOD gives back 
the  nodes to the shared batch scheduler. A group of users will use HOD to 
acquire Hadoop instances of varying sizes and the batch scheduler will schedule 
requests in a way that important jobs gain more importance/resources and finish 
fast. Here are a list of requirements for HOD and batch schedulers:

Key Requirements :

--- Should allocate the specified minimum number of nodes for a job 
   Many batch jobs can finish in time, only when enough resources are 
allocated. Therefore batch scheduler should allocate the asked number of nodes 
for a given job when the job starts. This is simple form of what's known as 
gang    scheduling.

  Often the minimum nodes are not available right away, especially if the job 
asked for a large number. The batch scheduler should support advance 
reservation for important jobs so that the wait time can be determined. In 
advance   reservation, a reservation is created on earliest future point when 
the preoccupied nodes become available. When nodes are currently idle but 
booked by future reservations, batch scheduler is ok to give them to other jobs 
to increase system utilization, but only when doing so does not delay existing 
reservations.

--- run short urgent job without costing too much loss to long job. Especially, 
should not kill job tracker of long job. 
  Some jobs, mostly short ones, are time sensitive and need urgent treatment. 
Often, large portion of cluster nodes will be occupied by long running jobs. 
Batch scheduler should be able to preempt long jobs and run urgent jobs. Then, 
urgent jobs will finish quickly and long jobs can re-gain the nodes afterward. 

When preemption happens, HOD should minimize the loss to long jobs. Especially, 
it should not kill job tracker of long job.

--- be able to dial up, at run time, share of resources for more important 
projects.
  Viewed at high level, a given cluster is shared by multiple projects. A 
project consists of a number of jobs submitted by a group of users.Batch 
scheduler should allow important projects to have more resources. This should 
be tunable at run time as what projects deem more important may change over 
time. 

--- prevent malicious abuse of the system. 
  A shared cluster environment can be put in jeopardy if malicious or erroneous 
job code does: 
 -- hold unneeded resources for a long period 
 -- use privileges for unworthy work 
  Such abuse can easily cause under-utilization or starvation of other jobs. 
Batch scheduler should allow  setting up policies for preventing resource abuse 
by: 
 -- limit privileges to legitimate uses asking for proper amount 
 -- throttle peak use of resources per player 
 -- monitor and reduce starvation 

--- The behavior should be simple and predictable 
   When status of the system is queried, we should be able to determine what 
factors caused it to reach current status and what could be the future behavior 
with or without our tuning on the system. 

--- be portable to major resource managers 
   HOD design should be portable so that in future we are able to plugin other 
resource manager. 

Some of the key requirements are implemented by the batch schedulers. The 
others need to be implemented by HOD.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to