[
https://issues.apache.org/jira/browse/HADOOP-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655280#action_12655280
]
sandholm edited comment on HADOOP-4768 at 12/10/08 11:18 AM:
--------------------------------------------------------------------
Thanks for your input,
consumable quotas, and budget accounting is a requirement that we have that is
not supported by any of the schedulers today. It allows users themselves to
change regulated priorities that are valid in a competitive multi-user setting
(where social peer pressure assumptions break down). The idea here is that
demand varies over time as do user job priority preferences. When demand is
high you would want to encourage only the most important jobs to be run and
give users with low priority jobs an incentive to hold off on submitting their
jobs. Also note that the priorities a user sets that do not affect her in any
way tend to be very different from the priorities she would have to pay for in
some way. Having access to a user's 'truthful' priorities allows the scheduler
to do a more accurate job in efficiently mapping users to available resources
while taking current demand into account.
Back to the implementation approach. As I mentioned above, one approach I
evaluated was to have a separate process that pushes the necessary changes to
the config files. The fact that the capacity scheduler currently doesn't
support dynamic updates of the config file is a minor issue in this context and
I actually also used a patch that fixed this. The more important showstopper
for this approach was the need to replicate the whole reliable hadoop service
infrastructure. We have implemented our own systems and services to do much
more involved budget accounting than this but contributing that whole package
to Hadoop would be too much work and all of it may not be useful to the Hadoop
community in general. So what I tried to do in this patch was to extract the
most important pieces from our previous work that solves the above mentioned
problems using as much of the existing hadoop infrastructure as possible.
Therefore, ideally we would like to plug in some code in the scheduler event
loop that allows us to set priorities (that have been paid for and that are
being accounted for towards a budget). Implementing our own scheduler
altogether was an option but we are not so interested in and don't have the
low-level experise in how the priorities should be enforced in the map/reduce
context. Hence, it seemed natural to reuse the fairshare or capacity scheduler
for this purpose. If we assume that we have a scheduler-collocated budget
algorithm it seems very roundabout and difficult to support multiple priority
enforcers if we need to handle all the different configuration file formats of
the individual schedulers. Fiddling around with xpath will also add a
configuration and parsing complexity apart from limiting performance. A better
solution in my opinion would be to have a way for the plugin to communicate and
update in priorities directly to the scheduler within the given scheduler
framework. The only interface in the current code base I found that could be
used for this purpose was Confiuration properties. This in-memory approach also
has the advantage that shedulers can implement more sophisticated enforcement
of shares paid for by users as both Vivek and Matei alluded to above.
To summarize, my requirements for the scheduling framework are as follows:
-Scheduler independent plug point in the job tracker event loop to host budget
accounting algorithm and to communicate paid-for shares to resource-share
enforcers such as the existing two schedulers
-Scheduler independent interface to communicate paid-for shares to resource
share-enforcers (this could still be 'standardized' xml config files if you
find that appropriate but it has the performance and complexity implications I
mentioned above)
The patch i submitted may not solve these problems in the absolute optimal way
because i didn't want to change any interfaces in core or the scheduler
framework itself. It represents my understanding of the simplest way to address
these issues with the current interfaces though, and it is a first attempt to
contribute our work to the hadoop community. Our falback is to just pick one
scheduler and modify the config file from within our system, but we would not
contribute anything to the community then and we would be left with a brittle
interface to a specific scheduler.
I will also talk through these issues with Owen and Arun when I meet them on
Thursday and report back here.
was (Author: sandholm):
Thanks for your input,
consumable quotas, and budget accounting is a requirement that we have that is
not supported by any of the schedulers today. It allows users themselves to
change regulated priorities that are valid in a competitive multi-user setting
(where social peer pressure assumptions break down). The idea here is that
demand varies over time as do user job priority preferences. When demand is
high you would want to encourage only the most important jobs to be run and
give users with low priority jobs an incentive to hold off on submitting their
jobs. Also note that the priorities a user sets that do not affect her in any
way tend to be very different from the priorities she would have to pay for in
some way. Having access to a user's 'truthful' priorities allows the scheduler
to do a more accurate job in efficiently mapping users to available resources
while taking current demand into account.
Back to the implementation approach. As I mentioned above, one approach I
evaluated was to have a separate process that pushes the necessary changes to
the config files. The fact that the capacity scheduler currently doesn't
support dynamic updates of the config file is a minor issue in this context and
I actually also used a patch that fixed this. The more important showstopper
for this approach was the need to replicate the whole reliable hadoop service
infrastructure. We have implemented our own systems and services to do much
more involved budget accounting than this but contributing that whole package
to Hadoop would be too much work and all of it may not be useful to the Hadoop
community in general. So what I tried to do in this patch was to extract the
most important pieces from our previous work that solves the above mentioned
problems using as much of the existing hadoop infrastructure as possible.
Therefore, ideally we would like to plug in some code in the scheduler event
loop that allows us to set priorities (that have been paid for and that are
being accounted for towards a budget). Implementing our own scheduler
altogether was an option but we are not so interested in and don't have the
low-level experise in how the priorities should be enforced in the map/reduce
context. Hence, it seemed natural to reuse the fairshare or capacity scheduler
for this purpose. If we assume that we have a scheduler-collocated budget
algorithm it seems very roundabout and difficult to support multiple priority
enforcers if we need to handle all the different configuration file formats of
the individual schedulers. Fiddling around with xpath will also add a
configuration and parsing complexity apart from limiting performance. A better
solution in my opinion would be to have a way for the plugin to communicate and
update in priorities directly to the scheduler within the given scheduler
framework. The only interface in the current code base I found that could be
used for this purpose was Confiuration properties. This in-memory approach also
has the advantage that shedulers can implement more sophisticated enforcement
of shares paid for by users as both Vivek and Matei alluded to above.
To summarize, my requirements for the scheduling framework are as follows:
-Scheduler independent plug point in the job tracker event loop to host budget
accounting algorithm and to communicate paid-for shares to resource-share
enforcers such as the existing two schedulers
-Scheduler independent interface to communicate paid-for shares to resource
share-enforcers (this could still be 'standardized' xml config files if you
find that appropriate but it has the performance and complexity implications I
mentioned above)
The patch i submitted may not solve these problems in the absolute optimal way
because i didn't want to change any interfaces in core or the scheduler
framework itself. It represents my understanding of the simplest way to address
these issues with the current interfaces though, and it is a first attempt to
contribute our work to the hadoop community. Our falback is to just pick one
scheduler and modify the config file from within our system, but we would not
contribute anything to the community then and we would be left with a brittle
interface to a specific scheduler.
I will also talk through these issues with Owen and Arun when I meet them on
Thursday and report back here.
The problem here is that the whole hadoop service infrastructure needs to be
replicated which
> Dynamic Priority Scheduler that allows queue shares to be controlled
> dynamically by a currency
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-4768
> URL: https://issues.apache.org/jira/browse/HADOOP-4768
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/capacity-sched, contrib/fair-share
> Affects Versions: 0.20.0
> Reporter: Thomas Sandholm
> Assignee: Thomas Sandholm
> Fix For: 0.20.0
>
> Attachments: HADOOP-4768-capacity-scheduler.patch,
> HADOOP-4768-dynamic-scheduler.patch, HADOOP-4768-fairshare.patch,
> HADOOP-4768.patch
>
>
> Contribution based on work presented at the Hadoop User Group meeting in
> Santa Clara in September and the HadoopCamp in New Orleans in November.
> From README:
> This package implements dynamic priority scheduling for MapReduce jobs.
> Overview
> --------
> The purpose of this scheduler is to allow users to increase and decrease
> their queue priorities continuosly to meet the requirements of their
> current workloads. The scheduler is aware of the current demand and makes
> it more expensive to boost the priority under peak usage times. Thus
> users who move their workload to low usage times are rewarded with
> discounts. Priorities can only be boosted within a limited quota.
> All users are given a quota or a budget which is deducted periodically
> in configurable accounting intervals. How much of the budget is
> deducted is determined by a per-user spending rate, which may
> be modified at any time directly by the user. The cluster slots
> share allocated to a particular user is computed as that users
> spending rate over the sum of all spending rates in the same accounting
> period.
> Configuration
> -------------
> This scheduler has been designed as a meta-scheduler on top of
> existing MapReduce schedulers, which are responsible for enforcing
> shares computed by the dynamic scheduler in the cluster. Thie configuration
> of this MapReduce scheduler does not have to change when deploying
> the dynamic scheduler.
> Hadoop Configuration (e.g. hadoop-site.xml):
> mapred.jobtracker.taskScheduler This needs to be set to
>
> org.apache.hadoop.mapred.DynamicPriorityScheduler
> to use the dynamic scheduler.
> mapred.queue.names All queues managed by the dynamic
> scheduler must be listed
> here (comma separated no spaces)
> Scheduler Configuration:
> mapred.dynamic-scheduler.scheduler The Java path of the MapReduce scheduler
> that should
> enforce the allocated shares.
> Has been tested with:
> org.apache.hadoop.mapred.FairScheduler
> and
>
> org.apache.hadoop.mapred.CapacityTaskScheduler
> mapred.dynamic-scheduler.budgetfile The full OS path of the file from which
> the
> budgets are read. The synatx of this
> file is:
> <queueName> <budget>
> separated by newlines where budget can
> be specified
> as a Java float
> mapred.dynamic-scheduler.spendfile The full OS path of the file from which
> the
> user/queue spending rate is read. It
> allows
> the queue name to be placed into the path
> at runtime, e.g.:
> /home/%QUEUE%/.spending
> Only the user(s) who submit jobs to the
> specified queue should have write access
> to this file. The syntax of the file is
> just:
> <spending rate>
> where the spending rate is specified as a
> Java float. If no spending rate is
> specified
> the rate defaults to budget/1000.
> mapred.dynamic-scheduler.alloc Allocation interval, when the scheduler
> rereads the
> spending rates and recalculates the
> cluster shares.
> Specified as seconds between allocations.
> Default is 20 seconds.
> mapred.dynamic-scheduler.budgetset Boolean which is true if the budget
> should be deducted
> by the scheduler and the updated budget
> written to the
> budget file. Default is true. Setting
> this to false is
> useful if there is a tool that controls
> budgets and
> spending rates externally to the
> scheduler.
> Runtime Configuration:
> mapred.scheduler.shares The shares that should be allocated to
> the specified queue.
> The configuration property is a comma
> separated list of
> strings where the odd positioned
> elements are the
> queue names and the even positioned
> elements are the shares
> as Java floats of the preceding queue
> name. It is updated
> for all the queues atomically in each
> allocation pass. MapReduce
> schedulers such as the Fair and
> CapacityTask schedulers
> are expected to read from this property
> periodically.
> Example property value:
> "queue1,45.0,queue2,55.0"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.