[
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vivek Ratan updated HADOOP-4981:
--------------------------------
Attachment: 4981.1.patch
I can see two ways to solve this problem.
# Check if the TT has enough mem for the job. If yes, obtain a task from the
job, as before. If not, determine whether the job has a task to run.
Determining this can be performance intensive (it should be similar to
_obtainNewMapTask_ or _obtainNewReduceTask_; in fact, we can call the same
methods, but pass in a flag that just doesn't update data structures), but we
only do it for high mem jobs and where the TT does not have enough memory, so
we're no worse off than the normal case.
# First obtain a task from the job. If a task is returned, then check for mem
requirements. If TT does not have enough mem, 'return the task' back to the
job. This code to return a task can be complicated, as we need to undo all the
data structures we updated in obtaining a task.
Option 1 looks better. I've attached a patch (4981.1.patch) where I add two new
methods to _JobInProgress_: _hasNewMapTask_ and _hasNewReduceTask_. Both of
these just call the respective _obtainNewXXXTask_, but with a flag which says
the call is read-only. I have modified the _obtainNewXXXTask_ to take in the
'readOnly' flag and update data structures and log message only if the flag is
false (thanks, Devaraj, for the idea). Only thing I don't like about this patch
is the number of if statements that check the flag before logging or updating
any data structure. Another option is to refactor the JobInProgress code to
separate out the detection of a task from updating data structures, but that
seems much more messy.
> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4981
> URL: https://issues.apache.org/jira/browse/HADOOP-4981
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Reporter: Vivek Ratan
> Priority: Blocker
> Attachments: 4981.1.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask())
> only if the number of pending tasks for a job is greater than zero (see the
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending
> tasks and only has running tasks, it will never be given a slot, and will
> never have a chance to run a speculative task.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.