FairScheduler preemption should only preempt tasks for pools/jobs that are up
next for scheduling
-------------------------------------------------------------------------------------------------
Key: MAPREDUCE-2205
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2205
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/fair-share
Reporter: Joydeep Sen Sarma
We have hit a problem with the preemption implementation in the FairScheduler
where the following happens:
# job X runs short of fair share or min share and requests/causes N tasks to be
preempted
# when slots are then scheduled - tasks from some other job are actually
scheduled
# after preemption_interval has passed, job X finds it's still underscheduled
and requests preemption. goto 1.
This has caused widespread preemption of tasks and the cluster going from high
utilization to low utilization in a few minutes.
Some of the problems are specific to our internal version of hadoop (still 0.20
and doesn't have the hierarchical FairScheduler) - but i think the issue here
is generic (just took a look at the trunk assignTasks and tasksToPreempt
routines). The basic problem seems to be that the logic of
assignTasks+FairShareComparator is not consistent with the logic in
tasksToPreempt(). The latter can choose to preempt tasks on behalf of jobs that
may not be first up for scheduling based on the FairComparator. Understanding
whether these two separate pieces of logic are consistent and keeping it that
way is difficult.
It seems that a much safer preemption implementation is to walk the jobs in the
order they would be scheduled on the next heartbeat - and only preempt for jobs
that are at the head of this sorted queue. In MAPREDUCE-2048 - we have already
introduced a pre-sorted list of jobs ordered by current scheduling priority. It
seems much easier to preempt only jobs at the head of this sorted list.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.