[ 
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ratan updated HADOOP-4981:
--------------------------------

    Attachment: 4981.1.patch

I can see two ways to solve this problem. 
# Check if the TT has enough mem for the job. If yes, obtain a task from the 
job, as before. If not, determine whether the job has a task to run. 
Determining this can be performance intensive (it should be similar to 
_obtainNewMapTask_ or _obtainNewReduceTask_; in fact, we can call the same 
methods, but pass in a flag that just doesn't update data structures), but we 
only do it for high mem jobs and where the TT does not have enough memory, so 
we're no worse off than the normal case. 
# First obtain a task from the job. If a task is returned, then check for mem 
requirements. If TT does not have enough mem, 'return the task' back to the 
job. This code to return a task can be complicated, as we need to undo all the 
data structures we updated in obtaining a task. 

Option 1 looks better. I've attached a patch (4981.1.patch) where I add two new 
methods to _JobInProgress_: _hasNewMapTask_ and _hasNewReduceTask_. Both of 
these just call the respective _obtainNewXXXTask_, but with a flag which says 
the call is read-only. I have modified the _obtainNewXXXTask_ to take in the 
'readOnly' flag and update data structures and log message only if the flag is 
false (thanks, Devaraj, for the idea). Only thing I don't like about this patch 
is the number of if statements that check the flag before logging or updating 
any data structure. Another option is to refactor the JobInProgress code to 
separate out the detection of a task from updating data structures, but that 
seems much more messy. 


> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4981
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4981.1.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a 
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask()) 
> only if the number of pending tasks for a job is greater than zero (see the 
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending 
> tasks and only has running tasks, it will never be given a slot, and will 
> never have a chance to run a speculative task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to