[jira] Commented: (HADOOP-4981) Prior code fix in Capacity Scheduler prevents speculative execution in jobs

Vivek Ratan (JIRA) Fri, 23 Jan 2009 01:13:24 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666454#action_12666454
 ]


Vivek Ratan commented on HADOOP-4981:
-------------------------------------

bq. ...but this change modifies substantial parts of the Map-Reduce framework 
in ways difficult to understand for a relatively uncommon corner-case.
Even if you think this is a 'relatively uncommon corner-case', I don't believe 
it will remain so. I agree with Matei - detecting if a job has a task to run 
does seem to be an a relatively important functionality that can be used in 
scheduling (see another use-case below). 

bq. In any case a high-mem job might not have a task to run at a given moment, 
but what happens when it's running tasks fail, tasktrackers go down etc. ?
I don't see any problem here. If a high-mem job doesn't currently have tasks to 
run, you move on to the next job. If running tasks fail or TTs go down, the 
high-mem job will eventually have tasks to run, so that at some point, when we 
check if it has tasks to run, the answer is yes, and we will block the TT. The 
opposite case is a bit more interesting. A job may say it has tasks to run 
because, at that point, one of the tasks is a candidate for speculation as it's 
progressing slowly. So you block the TT. Eventually, the task that could have 
been speculated catches up, so that when a slot is actually free, you don't 
really need to run a speculative task. So you did block a TT unnecessarily. But 
I think that's OK, rare, and probably unavoidable. 

bq. From a design perspective we have to recognize that currently the 
Map-Reduce framework isn't fundamentally setup for what you are trying to do
That's not clear to me. You ask a JobInProgress to give you a task. In terms of 
design, it seems perfectly natural to ask a JobInProgress object if it has a 
task (a 'peek' versus a 'get'). Or, if it gives you a task, you can ask it to 
take it back. This feature can be very useful in some other use cases. We've 
heard of jobs that deal with third party licenses. Maps in different jobs may 
need to access different licenses to run (or, instead of licenses, you can 
think of rate limiting: at any given time, only, say, 30 maps can have an open 
connection to some external resource).What is needed then is to first find out 
which task from which job would run, then check if the license for that 
particular task is valid. If not, you want to put the task back in the job, so 
to speak. So design-wise, I see this as a useful feature and logical to have in 
JobInProgress. If you're suggesting that there's too much code to untangle to 
support this feature, that's a slightly different situation. Are you? Granted 
that a read-only flag is somewhat ugly, does it make the code so much 
un-maintainable that we prefer taking a performance hit? I'm not so sure. Sure, 
the performance hit can be mitigated by ignoring the high-mem job occasionally, 
but you can end up with a not-insignificant number of TTs being blocked if the 
high-mem job also has a large number of tasks.

I also think that skipping a high mem job for a certain period can really hurt. 
The code, based on comments in HADOOP-4667, will look like this: 
{code}
if (TT has enough space for the high-mem job) {
  get task from high-mem job;
}
else {
  if (we've skipped this job too many times already) {
    block TT (return no task to it);
  }
  else {
    note that we're skipping this job;
    look at next job;
  }
}
{code}

The whole point of blocking a TT is that you want it to finish its existing 
tasks quickly so it has enough space for the high-mem job, i..e, you're 
improving the chances of the TT to satisfy this job's request the next time. If 
you delay the blocking, you do NOT improve the chances of a high-mem job being 
satisfied by the TT as much. By delaying blocking, you're going to end up 
starving high-mem jobs even more. 

I realize that the fix involves a non-trivial refactoring of critical code, but 
I'm not convinced we can't or shouldn't do it. Anybody else agree/disagree? 
Again, does the read-only flag really make the code so un-maintainable? My 
first patch was just a way to see how we can do something. Let me see if I (or 
someone else) can make it better, but it'll be good to understand how ugly or 
un-maintainable it really makes the code. 

> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4981
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4981.1.patch, 4981.2.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a 
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask()) 
> only if the number of pending tasks for a job is greater than zero (see the 
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending 
> tasks and only has running tasks, it will never be given a slot, and will 
> never have a chance to run a speculative task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4981) Prior code fix in Capacity Scheduler prevents speculative execution in jobs

Reply via email to