[
https://issues.apache.org/jira/browse/HADOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602247#action_12602247
]
Vivek Ratan commented on HADOOP-3421:
-------------------------------------
Regarding Reqs 3.1 and 3.2: In Hadoop today, if a user does not assign a
priority to a job when the job is submitted, the system assigns a default job
priority. JobConf::getJobPriority() returns JobPriority.NORMAL if no priority
has been set in the job's config file. This means that once a job is read into
Hadoop, it always has a priority, as it is implemented today. That seems
reasonable enough to me. In order not to change things around, the two reqs
should be modified as follows:
*3.1*. Jobs have priorities associated with them (users can optionally assign a
priority to a job, or else the system assigns a default priority). For V1, we
support the same set of priorities available to MR jobs today.
*3.2*. Queues can optionally support priorities for jobs. By default, a queue
does not support priorities, in which case it will ignore (with a warning) any
priority levels specified by jobs submitted to it. If a queue does support
priorities, it will respect the priority set for the job.
I realize that we're tweaking a requirement based on current implementation,
but spirit of the requirement was that we continue letting users submit jobs as
before, and the system's behavior does not change. I think that is captured
better with the newer set of requirements.
> Requirements for a Resource Manager for Hadoop
> ----------------------------------------------
>
> Key: HADOOP-3421
> URL: https://issues.apache.org/jira/browse/HADOOP-3421
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Vivek Ratan
>
> This is a proposal to extend the scheduling functionality of Hadoop to allow
> sharing of large clusters without the use of HOD. We're suffering from
> performance issues with HOD and not finding it the right model for running
> jobs. We have concluded that a native Hadoop Resource Manager would be more
> useful to many people if it supported the features we need for sharing
> clusters across large groups and organizations.
> Below are the key requirements for a Resource Manager for Hadoop. First, some
> terminology used in this writeup:
> * *RM*: Resource Manager. What we're building.
> * *MR*: Map Reduce.
> * A *job* is an MR job for now, but can be any request. Jobs are submitted by
> users to the Grid. MR jobs are made up of units of computation called *tasks*.
> * A grid has a variety of *resources* of different *capacities* that are
> allocated to tasks. For the the early version of the grid, the only resource
> considered is a Map or Reduce slot, which can execute a task. Each slot can
> run one or more tasks. Later versions may look at resources such as local
> temporary storage or CPUs.
> * *V1*: version 1. Some features are simplified for V1.
> h3. Orgs, queues, users, jobs
> Organizations (*Orgs*) are distinct entities for administration,
> configuration, billing and reporting purposes. *Users* belong to Orgs. Orgs
> have *queues* of jobs, where a queue represents a collection of jobs that
> share some scheduling criteria.
> * *1.1.* For V1, each queue will belong to one Org and each Org will have
> one queue.
> * *1.2.* Jobs are submitted to queues. A single job can be submitted to
> only one queue. It follows that a job will have a user and an Org associated
> with it.
> * *1.3.* A user can belong to multiple Orgs and can potentially submit
> jobs to multiple queues.
> * *1.4.* Orgs are guaranteed a fraction of the capacity of the grid (their
> 'guaranteed capacity') in the sense that a certain capacity of resources will
> be at their disposal. All jobs submitted to the queues of an Org will have
> access to the capacity guaranteed to the Org.
> ** Note: it is expected that the sum of the guaranteed capacity of each
> Org should equal the resources in the Grid. If the sum is lower, some
> resources will not be used. If the sum is higher, the RM cannot maintain
> guarantees for all Orgs.
> * *1.5.* At any given time, free resources can be allocated to any Org
> beyond their guaranteed capacity. For example this may be in the proportion
> of guaranteed capacities of various Orgs or some other way. However, these
> excess allocated resources can be reclaimed and made available to another Org
> in order to meet its capacity guarantee.
> * *1.6.* N minutes after an org reclaims resources, it should have all its
> reserved capacity available. Put another way, the system will guarantee that
> excess resources taken from an Org will be restored to it within N minutes of
> its need for them.
> * *1.7.* Queues have access control. Queues can specify which users are
> (not) allowed to submit jobs to it. A user's job submission will be rejected
> if the user does not have access rights to the queue.
> h3. Job capacity
> * *2.1.* Users will just submit jobs to the Grid. They do not need to
> specify the capacity required for their jobs (i.e. how many parallel tasks
> the job needs). [Most MR jobs are elastic and do not require a fixed number
> of parallel tasks to run - they can run with as little or as much task
> parallelism as they can get. This amount of task parallelism is usually
> limited by the number of mappers required (which is computed by the system
> and not by the user) or the amount of free resources available in the grid.
> In most cases, the user wants to just submit a job and let the system take
> care of utilizing as many or as little resources as it can.]
> h3. Priorities
> * *3.1.* Jobs can optionally have priorities associated with them. For V1,
> we support the same set of priorities available to MR jobs today.
> * *3.2.* Queues can optionally support priorities for jobs. By default, a
> queue does not support priorities, in which case it will ignore (with a
> warning) any priority levels specified by jobs submitted to it. If a queue
> does support priorities, it will have a default priority associated with it,
> which is assigned to jobs that don't have priorities. Reqs 3.1 and 3.2
> together mean this: if a queue supports priorities, then a job is assigned
> the default priority if it doesn't have one specified, else the job's
> specified priority is used. If a queue does not support priorities, then it
> ignores priorities specified for jobs.
> * *3.3.* Within a queue, jobs with higher priority will have access to the
> queue's resources before jobs with lower priority. However, once a job is
> running, it will not be preempted (i.e., stopped and restarted) for a higher
> priority job. What this also means is that comparison of priorities makes
> sense within queues, and not across them.
> h3. Fairness/limits
> * *4.1.* In order to prevent one or more users from monopolizing its
> resources, each queue enforces a limit on the percentage of resources
> allocated to a user at any given time, if there is competition for them. This
> user limit can vary between a minimum and maximum value. For V1, all users
> have the same limit, whose maximum value is dictated by the number of users
> who have submitted jobs, and whose minimum value is a pre-configured value
> UL-MIN. For example, suppose UL-MIN is 25. If two users have submitted jobs
> to a queue, no single user can use more than 50% of the queue resources. If a
> third user submits a job, no single user can use more than 33% of the queue
> resources. With 4 or more users, no user can use more than 25% of the queue's
> resources.
> ** Limits apply to newer scheduling, i.e., running jobs or tasks will
> not be preempted.
> ** The value of UL-MIN can be set differently per Org.
> h3. Job queue interaction
> * *5.1.* Interaction with the Job queue should be through a command line
> interface and a web-based GUI.
> * *5.2.* All queues are visible to all users. The Web UI will provide a
> single-page view of all queues.
> * *5.3.* Users should be able to delete their queued jobs at any time.
> * *5.4.* Users should be able to see capacity statistics for various Orgs
> (what is the capacity allocated, how much is being used, etc.)
> * *5.5.* Existing utilities, such as the *hadoop job -list* command,
> should be enhanced to show additional attributes that are relevant. For e.g.
> which queue is associated with the job.
> h3. Accounting
> * *6.1.* The RM must provide accounting information in a manner that can
> be easily consumed by external plug-ins or utilities to integrate with 3rd
> party accounting systems. The accounting information should comprise of the
> following information:
> ** Username running the Hadoop job,
> ** job id,
> ** job name,
> ** queue to which job was submitted and organization owning the queue,
> ** number of resource units (for e.g. slots) used
> ** number of maps / reduces,
> ** timings - time of entry into the queue, start and end times of the
> job, perhaps total node hours,
> ** status of the job - success, failed, killed, etc.
> * *6.2.* To assist deployments which do not require accounting, it should
> be possible to turn off this feature.
> h3. Availability
> * *7.1* Job state needs to be persisted (RM restarts should not cause jobs
> to die)
> h3. Scalability
> * *8.1.* Scale to 3k+ nodes
> * *8.2.* Scale to handle 1k+ large submitted jobs, each with 100k+ tasks
> h3. Configuration
> * *9.1.* The system must provide a mechanism to create and delete
> organizations, and queues within the organizations. It must also provide a
> mechanism to configure various properties of these objects.
> * *9.2.* Only users with administrative privileges can perform operations
> of managing and configuring these objects in the system.
> * *9.3.* Configuration changes must be effective in the RM without
> requiring its restart. They must be effective in a reasonable amount of time
> since the modification is made.
> * *9.4.* For most of the configurations, it appears that there can be
> values at various levels - Grid, organization, queue, user and job. For e.g.
> there can be a default value for the resource quota per user at a Grid level,
> which can be overridden at an org level, and so on. There must be an easy way
> to express these configurations in this hierarchical fashion. Also, values at
> a broader level can be overridden by values at a more narrow level.
> * *9.5.* There must be appropriate default objects and default values for
> their configuration. This is to help deployments that do not need a complex
> scheduling setup.
> h3. Logging Enhancements
> * *10.1.* For purposes of debugging, the Hadoop web UI should provide a
> facility to see details of all jobs. While this is mostly supported today,
> any changes to meet other requirements, such as scalability, must not affect
> this feature. Also, it must be possible to view task logs from Job history UI
> (see HADOOP:2165)
> * *10.2.* The system must log all relevant events about a job vis-a-vis
> scheduling. Particularly, changes in state of a job (queued -> scheduled ->
> completed | killed), and events which caused these changes must be logged.
> * *10.3.* The system should be able to provide relevant, explanatory
> information about the status of job to give feedback to users. This could be
> a diagnostic string such as why the job is queued or why it failed. (For e.g.
> lack of sufficient resources - how many were asked, how many are available,
> exceeding user limits, etc). This information must be available to users, as
> well as in the logs for debugging purposes. It should also be possible to
> programmatically get this information.
> * *10.4.* The host which submitted the job should be part of log messages.
> This would assist in debugging.
> h3. Security Enhancements
> * *11.1.* The RM should provide a mechanism for controlling who can submit
> jobs to which queue. This could be done using an ACL mechanism that consists
> of an ordered whitelist and blacklist of users. The order can determine which
> ACL would apply in case of conflicts.
> * *11.2.* The system must provide a mechanism to list users who have
> administrative control. Only users in this list should be allowed to modify
> configuration related to the RM, like configuration, setting up objects, etc.
> * *11.3.* The system should be able to schedule tasks running on behalf of
> multiple users concurrently on the same host in a secure manner.
> Specifically, this should not require any insecure configuration, such as
> requiring 0777 permissions on directories etc.
> * *11.4.* The system must follow the security mechanisms being implemented
> for Hadoop (HADOOP:1701 and friends).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.