[ 
https://issues.apache.org/jira/browse/HADOOP-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated HADOOP-4665:
----------------------------------

    Attachment: fs-preemption-v0.patch

Here is an initial version of the patch for review. The main thing missing is 
unit tests.

The patch adds two things. First there's the preemption, which works as 
described in the issue - jobs may preempt others if either they aren't 
receiving their guaranteed share for some time, or they are at below half their 
fair share and negative deficit for some time. The times can be configured in 
the fair scheduler config file and thus modified at runtime, and the guaranteed 
share timeouts are per pool. On top of this, to aid with debugging and 
development of the fair scheduler in the future, there is a scheduler event 
log, which is disabled by default but creates some event logs in tab-separated 
format in $hadoop.log.dir/fairscheduler if you turn it on. These are meant to 
be nitty-gritty detailed logs with machine-parsable event types rather than the 
"human-readable" logs that go into the standard log4j log for the JobTracker. 
They are also potentially much larger on a large cluster, which is why they're 
off by default.

I'm running this through hudson to see whether there are complaints from 
findbugs, checkstyle, etc, but I will include some unit tests in the final 
patch.

> Add preemption to the fair scheduler
> ------------------------------------
>
>                 Key: HADOOP-4665
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4665
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>         Attachments: fs-preemption-v0.patch
>
>
> Task preemption is necessary in a multi-user Hadoop cluster for two reasons: 
> users might submit long-running tasks by mistake (e.g. an infinite loop in a 
> map program), or tasks may be long due to having to process large amounts of 
> data. The Fair Scheduler (HADOOP-3746) has a concept of guaranteed capacity 
> for certain queues, as well as a goal of providing good performance for 
> interactive jobs on average through fair sharing. Therefore, it will support 
> preempting under two conditions:
> 1) A job isn't getting its _guaranteed_ share of the cluster for at least T1 
> seconds.
> 2) A job is getting significantly less than its _fair_ share for T2 seconds 
> (e.g. less than half its share).
> T1 will be chosen smaller than T2 (and will be configurable per queue) to 
> meet guarantees quickly. T2 is meant as a last resort in case non-critical 
> jobs in queues with no guaranteed capacity are being starved.
> When deciding which tasks to kill to make room for the job, we will use the 
> following heuristics:
> - Look for tasks to kill only in jobs that have more than their fair share, 
> ordering these by deficit (most overscheduled jobs first).
> - For maps: kill tasks that have run for the least amount of time (limiting 
> wasted time).
> - For reduces: similar to maps, but give extra preference for reduces in the 
> copy phase where there is not much map output per task (at Facebook, we have 
> observed this to be the main time we need preemption - when a job has a long 
> map phase and its reducers are mostly sitting idle and filling up slots).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to