[ 
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666427#action_12666427
 ] 

Vivek Ratan commented on HADOOP-4981:
-------------------------------------

If I understand correctly, you're suggesting that we skip a job if it has mem 
requirements that cannot be met, while making sure we don't skip it too many 
times to starve it. As opposed to blocking right way (i.e, returning no task to 
the TT).

We did consider this approach a while back, and the general consensus was that 
it's better to block right away than to selectively block. Regardless, I don't 
think that solves the problem this Jira is addressing. Whether you do it once 
in a while, or always, you're still going to need to look at a high-mem job at 
some point and decide whether to block the TT or not. And you're still going to 
need to see if the high-mem job has at least one task to run. of course, you 
could skip this step by always blocking the TT, but then you would have 
underutilized TTs if the high-mem job does not have any more task to run. 

I thought the real issue here was how to write clean code to detect if a job 
has a task to run, i.e., it's more of a software design problem rather than a 
performance issue. We can certainly discuss/re-discuss whether it makes sense 
to block always or once in a while, but that seems like another discussion. Am 
I missing something? 

> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4981
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4981.1.patch, 4981.2.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a 
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask()) 
> only if the number of pending tasks for a job is greater than zero (see the 
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending 
> tasks and only has running tasks, it will never be given a slot, and will 
> never have a chance to run a speculative task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to