[ 
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663229#action_12663229
 ] 

Vivek Ratan commented on HADOOP-4981:
-------------------------------------

bq. If the TT doesn't have enough memory for the job's tasks, what is the point 
trying to check if the job has a task to run? I think the capacity scheduler 
should just ask for a task iff the TT has enough memory and the let the 
JobInProgress hand out normal, failed or speculative tasks as is done in the 
default scheduler.

Suppose the TT does not have enough memory to run a high-mem job's task. This 
usually happens because the TT is running other tasks which are consuming 
memory. In order to prevent the high-mem job from starving, we'd like to return 
nothing to the TT, in the hope that eventually the TT will finish all its tasks 
and will have enough memory to run this job's tasks. In the meantime, if 
another TT comes along with enough mem, great. However, suppose the high-mem 
job has no further task to run: no pending tasks and no speculative tasks 
(assume all running tasks are making adequate progress). In this case, 
returning nothing to the TT is a waste. You should move on to the next job. So 
I need a way to tell me if the job potentially has a task to run. Granted this 
is not perfect. At that moment, the job may decide that it can run a 
speculative task, but when you actually ask it for a task (which happens at a 
later heartbeat), it may return nothing because at that point, the potentially 
speculative task has made enough progress. But still, you want to minimize 
under-utilizing TTs while at the same time, preventing starvation of high-mem 
jobs. 

I was mainly soliciting ways to better detect if a job has a task to run. The 
approach in the patch is to use the same code path that obtains a new task, but 
not update any data structures. The only thing I didn't like much about this 
approach is the if-blocks to prevent updating data structures.  


> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4981
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4981.1.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a 
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask()) 
> only if the number of pending tasks for a job is greater than zero (see the 
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending 
> tasks and only has running tasks, it will never be given a slot, and will 
> never have a chance to run a speculative task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to