[
https://issues.apache.org/jira/browse/HADOOP-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672472#action_12672472
]
matei edited comment on HADOOP-5187 at 2/10/09 6:17 PM:
----------------------------------------------------------------
Yes, the design would be as follows:
* Each job belongs to a pool. Pools may be marked as either FIFO or fair
sharing.
* Each pool has a minimum share (guaranteed share) defined in the config. Any
excess capacity is divided between pools according to fair sharing, as in the
current scheduler.
* Each pool takes its min share and fair share and divides it among the jobs
inside the pool:
** For a fair sharing pool, we divide the min and fair shares equally among
jobs as happens now (well, technically using weights)
** For a FIFO pool, we give as much of the min share as possible to the first
job, give any excess to the second job (if the first job didn't have enough
unlaunched tasks to consume the pool's full share), etc until we run out. Same
with fair share.
* Now for the purpose of scheduling, we can have one big list of runnable jobs,
each of which has a min share and a fair share. We sort this list first by
whether the job is below its min share (breaking ties by how long it's been
below this), and then for the remaining jobs by how far each job is below its
fair share (as a percent). We then scan through it to pick tasks, using the
same wait technique proposed in HADOOP-4667 to skip jobs that don't happen to
have local tasks for the current heartbeat.
On top of this we can have any logic we want for user limits, when to
initialize jobs, etc (as we've been talking about in other JIRAs).
I think this should work without very complicated code, and will be much easier
to understand than the current deficit stuff. It also leaves the option open to
have pools with scheduling disciplines other than FIFO or fair sharing, since
the job of each pool is just to subdivide its own min and fair shares among the
jobs within it. This might enable something like HADOOP-5199.
was (Author: matei):
Yes, the design would be as follows:
* Each job belongs to a pool. Pools may be marked as either FIFO or fair
sharing.
* Each pool has a minimum share (guaranteed share) defined in the config. Any
excess capacity is divided between pools according to fair sharing, as in the
current scheduler.
* Each pool takes its min share and fair share and divides it among the jobs
inside the pool:
* For a fair sharing pool, we divide the min and fair shares equally among
jobs as happens now (well, technically using weights)
* For a FIFO pool, we give as much of the min share as possible to the first
job, give any excess to the second job (if the first job didn't have enough
unlaunched tasks to consume the pool's full share), etc until we run out. Same
with fair share.
* Now for the purpose of scheduling, we can have one big list of runnable jobs,
each of which has a min share and a fair share. We sort this list first by
whether the job is below its min share (breaking ties by how long it's been
below this), and then for the remaining jobs by how far each job is below its
fair share (as a percent). We then scan through it to pick tasks, using the
same wait technique proposed in HADOOP-4667 to skip jobs that don't happen to
have local tasks for the current heartbeat.
On top of this we can have any logic we want for user limits, when to
initialize jobs, etc (as we've been talking about in other JIRAs).
I think this should work without very complicated code, and will be much easier
to understand than the current deficit stuff. It also leaves the option open to
have pools with scheduling disciplines other than FIFO or fair sharing, since
the job of each pool is just to subdivide its own min and fair shares among the
jobs within it. This might enable something like HADOOP-5199.
> Provide an option to turn off priorities in jobs
> ------------------------------------------------
>
> Key: HADOOP-5187
> URL: https://issues.apache.org/jira/browse/HADOOP-5187
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/fair-share
> Reporter: Hemanth Yamijala
> Priority: Minor
>
> The fairshare scheduler can define pools mapping to queues (as defined in the
> capacity scheduler - HADOOP-3445). When used in this manner, one can imagine
> queues set up to be used by users who come from disparate teams or
> organizations (say a default queue). For such a queue, it makes sense to
> ignore job priorities and consider the queue as strict FIFO, as it is
> difficult to compare priorities of jobs from different users.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.