[jira] Commented: (HADOOP-3421) Requirements for a Resource Manager for Hadoop

Vivek Ratan (JIRA) Wed, 04 Jun 2008 04:30:11 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602247#action_12602247
 ]


Vivek Ratan commented on HADOOP-3421:
-------------------------------------

Regarding Reqs 3.1 and 3.2: In Hadoop today, if a user does not assign a 
priority to a job when the job is submitted, the system assigns a default job 
priority. JobConf::getJobPriority() returns JobPriority.NORMAL if no priority 
has been set in the job's config file. This means that once a job is read into 
Hadoop, it always has a priority, as it is implemented today. That seems 
reasonable enough to me. In order not to change things around, the two reqs 
should be modified as follows: 

*3.1*. Jobs have priorities associated with them (users can optionally assign a 
priority to a job, or else the system assigns a default priority). For V1, we 
support the same set of priorities available to MR jobs today.
*3.2*. Queues can optionally support priorities for jobs. By default, a queue 
does not support priorities, in which case it will ignore (with a warning) any 
priority levels specified by jobs submitted to it. If a queue does support 
priorities, it will respect the priority set for the job. 

I realize that we're tweaking a requirement based on current implementation, 
but spirit of the requirement was that we continue letting users submit jobs as 
before, and the system's behavior does not change. I think that is captured 
better with the newer set of requirements. 

> Requirements for a Resource Manager for Hadoop
> ----------------------------------------------
>
>                 Key: HADOOP-3421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3421
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> This is a proposal to extend the scheduling functionality of Hadoop to allow 
> sharing of large clusters without the use of HOD.  We're suffering from 
> performance issues with HOD and not finding it the right model for running 
> jobs. We have concluded that a native Hadoop Resource Manager would be more 
> useful to many people if it supported the features we need for sharing 
> clusters across large groups and organizations.
> Below are the key requirements for a Resource Manager for Hadoop. First, some 
> terminology used in this writeup: 
> * *RM*: Resource Manager. What we're building.
> * *MR*: Map Reduce.
> * A *job* is an MR job for now, but can be any request. Jobs are submitted by 
> users to the Grid. MR jobs are made up of units of computation called *tasks*.
> * A grid has a variety of *resources* of different *capacities* that are 
> allocated to tasks. For the the early version of the grid, the only resource 
> considered is a Map or Reduce slot, which can execute a task. Each slot can 
> run one or more tasks. Later versions may look at resources such as local 
> temporary storage or CPUs. 
> * *V1*: version 1. Some features are simplified for V1. 
> h3. Orgs, queues, users, jobs
> Organizations (*Orgs*) are distinct entities for administration, 
> configuration, billing and reporting purposes. *Users* belong to Orgs. Orgs 
> have *queues* of jobs, where a queue represents a collection of jobs that 
> share some scheduling criteria. 
>    * *1.1.* For V1, each queue will belong to one Org and each Org will have 
> one queue. 
>    * *1.2.* Jobs are submitted to queues. A single job can be submitted to 
> only one queue. It follows that a job will have a user and an Org associated 
> with it. 
>    * *1.3.* A user can belong to multiple Orgs and can potentially submit 
> jobs to multiple queues. 
>    * *1.4.* Orgs are guaranteed a fraction of the capacity of the grid (their 
> 'guaranteed capacity') in the sense that a certain capacity of resources will 
> be at their disposal. All jobs submitted to the queues of an Org will have 
> access to the capacity guaranteed to the Org. 
>       ** Note: it is expected that the sum of the guaranteed capacity of each 
> Org should equal the resources in the Grid. If the sum is lower, some 
> resources will not be used. If the sum is higher, the RM cannot maintain 
> guarantees for all Orgs. 
>    * *1.5.* At any given time, free resources can be allocated to any Org 
> beyond their guaranteed capacity. For example this may be in the proportion 
> of guaranteed capacities of various Orgs or some other way. However, these 
> excess allocated resources can be reclaimed and made available to another Org 
>  in order to meet its capacity guarantee.
>    * *1.6.* N minutes after an org reclaims resources, it should have all its 
> reserved capacity available. Put another way, the system will guarantee that 
> excess resources taken from an Org will be restored to it within N minutes of 
> its need for them.
>    * *1.7.* Queues have access control. Queues can specify which users are 
> (not) allowed to submit jobs to it. A user's job submission will be rejected 
> if the user does not have access rights to the queue. 
> h3. Job capacity
>    * *2.1.* Users will just submit jobs to the Grid. They do not need to 
> specify the capacity required for their jobs (i.e. how many parallel tasks 
> the job needs). [Most MR jobs are elastic and do not require a fixed number 
> of parallel tasks to run - they can run with as little or as much task 
> parallelism as they can get. This amount of task parallelism is usually 
> limited by the number of mappers required (which is computed by the system 
> and not by the user) or the amount of free resources available in the grid. 
> In most cases, the user wants to just submit a job and let the system take 
> care of utilizing as many or as little resources as it can.]
> h3. Priorities
>    * *3.1.* Jobs can optionally have priorities associated with them. For V1, 
> we support the same set of priorities available to MR jobs today. 
>    * *3.2.* Queues can optionally support priorities for jobs. By default, a 
> queue does not support priorities, in which case it will ignore (with a 
> warning) any priority levels specified by jobs submitted to it. If a queue 
> does support priorities, it will have a default priority associated with it, 
> which is assigned to jobs that don't have priorities. Reqs 3.1 and 3.2 
> together mean this: if a queue supports priorities, then a job is assigned 
> the default priority if it doesn't have one specified, else the job's 
> specified priority is used. If a queue does not support priorities, then it 
> ignores priorities specified for jobs. 
>    * *3.3.* Within a queue, jobs with higher priority will have access to the 
> queue's resources before jobs with lower priority. However, once a job is 
> running, it will not be preempted (i.e., stopped and restarted) for a higher 
> priority job. What this also means is that comparison of priorities makes 
> sense within queues, and not across them. 
> h3. Fairness/limits
>    * *4.1.* In order to prevent one or more users from monopolizing its 
> resources, each queue enforces a limit on the percentage of resources 
> allocated to a user at any given time, if there is competition for them. This 
> user limit can vary between a minimum and maximum value. For V1, all users 
> have the same limit, whose maximum value is dictated by the number of users 
> who have submitted jobs, and whose minimum value is a pre-configured value 
> UL-MIN. For example, suppose UL-MIN is 25. If two users have submitted jobs 
> to a queue, no single user can use more than 50% of the queue resources. If a 
> third user submits a job, no single user can use more than 33% of the queue 
> resources. With 4 or more users, no user can use more than 25% of the queue's 
> resources. 
>       ** Limits apply to newer scheduling, i.e., running jobs or tasks will 
> not be preempted. 
>       ** The value of UL-MIN can be set differently per Org.   
> h3. Job queue interaction
>    * *5.1.* Interaction with the Job queue should be through a command line 
> interface and a web-based GUI. 
>    * *5.2.* All queues are visible to all users. The Web UI will provide a 
> single-page view of all queues. 
>    * *5.3.* Users should be able to delete their queued jobs at any time. 
>    * *5.4.* Users should be able to see capacity statistics for various Orgs 
> (what is the capacity allocated, how much is being used, etc.)
>    * *5.5.* Existing utilities, such as the *hadoop job -list* command, 
> should be enhanced to show additional attributes that are relevant. For e.g. 
> which queue is associated with the job.
> h3. Accounting
>    * *6.1.* The RM must provide accounting information in a manner that can 
> be easily consumed by external plug-ins or utilities to integrate with 3rd 
> party accounting systems. The accounting information should comprise of the 
> following information: 
>       ** Username running the Hadoop job, 
>       ** job id, 
>       ** job name, 
>       ** queue to which job was submitted and organization owning the queue, 
>       ** number of resource units (for e.g. slots) used 
>       ** number of maps / reduces, 
>       ** timings - time of entry into the queue, start and end times of the 
> job, perhaps total node hours, 
>       ** status of the job - success, failed, killed, etc.
>    * *6.2.* To assist deployments which do not require accounting, it should 
> be possible to turn off this feature.
> h3. Availability
>    * *7.1* Job state needs to be persisted (RM restarts should not cause jobs 
> to die)
> h3. Scalability
>    * *8.1.* Scale to 3k+ nodes
>    * *8.2.* Scale to handle 1k+ large submitted jobs, each with 100k+ tasks
> h3. Configuration
>    * *9.1.* The system must provide a mechanism to create and delete 
> organizations, and queues within the organizations. It must also provide a 
> mechanism to configure various properties of these objects. 
>    * *9.2.* Only users with administrative privileges can perform operations 
> of managing and configuring these objects in the system.
>    * *9.3.* Configuration changes must be effective in the RM without 
> requiring its restart. They must be effective in a reasonable amount of time 
> since the modification is made.
>    * *9.4.* For most of the configurations, it appears that there can be 
> values at various levels - Grid, organization, queue, user and job. For e.g. 
> there can be a default value for the resource quota per user at a Grid level, 
> which can be overridden at an org level, and so on. There must be an easy way 
> to express these configurations in this hierarchical fashion. Also, values at 
> a broader level can be overridden by values at a more narrow level.
>    * *9.5.* There must be appropriate default objects and default values for 
> their configuration. This is to help deployments that do not need a complex 
> scheduling setup.
> h3. Logging Enhancements
>    * *10.1.* For purposes of debugging, the Hadoop web UI should provide a 
> facility to see details of all jobs. While this is mostly supported today, 
> any changes to meet other requirements, such as scalability, must not affect 
> this feature. Also, it must be possible to view task logs from Job history UI 
> (see HADOOP:2165)
>    * *10.2.* The system must log all relevant events about a job vis-a-vis 
> scheduling. Particularly, changes in state of a job (queued -> scheduled -> 
> completed | killed), and events which caused these changes must be logged.
>    * *10.3.* The system should be able to provide relevant, explanatory 
> information about the status of job to give feedback to users. This could be 
> a diagnostic string such as why the job is queued or why it failed. (For e.g. 
> lack of sufficient resources - how many were asked, how many are available, 
> exceeding user limits, etc). This information must be available to users, as 
> well as in the logs for debugging purposes. It should also be possible to 
> programmatically get this information.
>    * *10.4.* The host which submitted the job should be part of log messages. 
> This would assist in debugging.
> h3. Security Enhancements
>    * *11.1.* The RM should provide a mechanism for controlling who can submit 
> jobs to which queue. This could be done using an ACL mechanism that consists 
> of an ordered whitelist and blacklist of users. The order can determine which 
> ACL would apply in case of conflicts.
>    * *11.2.* The system must provide a mechanism to list users who have 
> administrative control. Only users in this list should be allowed to modify 
> configuration related to the RM, like configuration, setting up objects, etc.
>    * *11.3.* The system should be able to schedule tasks running on behalf of 
> multiple users concurrently on the same host in a secure manner. 
> Specifically, this should not require any insecure configuration, such as 
> requiring 0777 permissions on directories etc.
>    * *11.4.* The system must follow the security mechanisms being implemented 
> for Hadoop (HADOOP:1701 and friends).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3421) Requirements for a Resource Manager for Hadoop

Reply via email to