[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776149#action_12776149
 ] 

Arun C Murthy commented on MAPREDUCE-1204:
------------------------------------------

I'd rather have JobInProgress.shouldRunOnTaskTracker public for now.

Long term the right direction is for schedulers to maintain all scheduling 
information themselves and not rely on JobInProgress at all.

> Fair Scheduler preemption may preempt tasks running in slots unusable by the 
> preempting job
> -------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1204
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1204
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>
> The current preemption code works by first calculating how many tasks need to 
> be preempted to satisfy the min share constraints, and then killing an equal 
> number of tasks from other jobs, sorted to favor killing of young tasks. This 
> works fine for the general case, but there are some edge cases where this can 
> cause problems.
> For example, if the preempting job has blacklisted ("marked flaky") a 
> particular task tracker, and that tracker is running the youngest task, 
> preemption can still kill that task. The preempting job will then refuse that 
> slot, since the tracker has been blacklisted. The same task that just got 
> killed then gets rescheduled in that slot. This repeats ad infinitum until a 
> new slot opens in the cluster.
> I don't have a good test case for this, yet, but logically it is possible.
> One potential fix would be to add an API to JobInProgress that functions 
> identically to obtainNewMapTask but does not schedule the task. The 
> preemption code could then use this while iterating through the sorted 
> preemption list to check that the preempting jobs can actually make use of 
> the candidate slots before killing them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to