[jira] Commented: (HADOOP-719) Integration of Hadoop with batch schedulers

Mahadev konar (JIRA) Tue, 14 Nov 2006 14:18:38 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-719?page=comments#action_12449827 ] 
            
Mahadev konar commented on HADOOP-719:
--------------------------------------


-- another key requirement for HOD is distribution of hadoop jars onto the 
cluster. Most of the batch schedulers are not effective in transferring of 
files. HOD would require a service for transferring of hadoop jars. 

> Integration of Hadoop with batch schedulers
> -------------------------------------------
>
>                 Key: HADOOP-719
>                 URL: http://issues.apache.org/jira/browse/HADOOP-719
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Mahadev konar
>         Assigned To: Mahadev konar
>
> Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like 
> Condor/torque/sun grid etc. Hadoop On Demand or HOD hereafter is a system 
> that populates a Hadoop instance using a shared batch scheduler. HOD will 
> find a requested number of nodes and start up Hadoop daemons on them. Users 
> map reduce jobs can then run on the hadoop instance. After the job is done, 
> HOD gives back the  nodes to the shared batch scheduler. A group of users 
> will use HOD to acquire Hadoop instances of varying sizes and the batch 
> scheduler will schedule requests in a way that important jobs gain more 
> importance/resources and finish fast. Here are a list of requirements for HOD 
> and batch schedulers:
> Key Requirements :
> --- Should allocate the specified minimum number of nodes for a job 
>    Many batch jobs can finish in time, only when enough resources are 
> allocated. Therefore batch scheduler should allocate the asked number of 
> nodes for a given job when the job starts. This is simple form of what's 
> known as gang    scheduling.
>   Often the minimum nodes are not available right away, especially if the job 
> asked for a large number. The batch scheduler should support advance 
> reservation for important jobs so that the wait time can be determined. In 
> advance   reservation, a reservation is created on earliest future point when 
> the preoccupied nodes become available. When nodes are currently idle but 
> booked by future reservations, batch scheduler is ok to give them to other 
> jobs to increase system utilization, but only when doing so does not delay 
> existing reservations.
> --- run short urgent job without costing too much loss to long job. 
> Especially, should not kill job tracker of long job. 
>   Some jobs, mostly short ones, are time sensitive and need urgent treatment. 
> Often, large portion of cluster nodes will be occupied by long running jobs. 
> Batch scheduler should be able to preempt long jobs and run urgent jobs. 
> Then, urgent jobs will finish quickly and long jobs can re-gain the nodes 
> afterward. 
> When preemption happens, HOD should minimize the loss to long jobs. 
> Especially, it should not kill job tracker of long job.
> --- be able to dial up, at run time, share of resources for more important 
> projects.
>   Viewed at high level, a given cluster is shared by multiple projects. A 
> project consists of a number of jobs submitted by a group of users.Batch 
> scheduler should allow important projects to have more resources. This should 
> be tunable at run time as what projects deem more important may change over 
> time. 
> --- prevent malicious abuse of the system. 
>   A shared cluster environment can be put in jeopardy if malicious or 
> erroneous job code does: 
>  -- hold unneeded resources for a long period 
>  -- use privileges for unworthy work 
>   Such abuse can easily cause under-utilization or starvation of other jobs. 
> Batch scheduler should allow  setting up policies for preventing resource 
> abuse by: 
>  -- limit privileges to legitimate uses asking for proper amount 
>  -- throttle peak use of resources per player 
>  -- monitor and reduce starvation 
> --- The behavior should be simple and predictable 
>    When status of the system is queried, we should be able to determine what 
> factors caused it to reach current status and what could be the future 
> behavior with or without our tuning on the system. 
> --- be portable to major resource managers 
>    HOD design should be portable so that in future we are able to plugin 
> other resource manager. 
> Some of the key requirements are implemented by the batch schedulers. The 
> others need to be implemented by HOD.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-719) Integration of Hadoop with batch schedulers

Reply via email to