[jira] Updated: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler

Matei Zaharia (JIRA) Sun, 14 Feb 2010 19:43:54 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matei Zaharia updated MAPREDUCE-1436:
-------------------------------------

    Attachment: mapreduce-1436-v2.patch

Here's a new patch that always locks the JobTracker before locking the 
FairScheduler in update(). This should resolve both of the deadlocks reported 
above. I've also increased the default update interval from 0.5 seconds to 2.5 
seconds in this patch. The only negative impact of this should be that 
preemption and speculation take slightly longer to kick in. These are really 
the only reasons we need to call update() other than when jobs are added and 
removed; speculative tasks are counted in updateDemand, and preemption is 
checked regularly in updatePreemptionVariables().

I've also thought a bit about the impact of coarser locking on performance of 
the JobTracker, and I think it's actually not that much. First of all, since 
assignTasks already locks the FairScheduler, we wouldn't get much farther by 
locking only the FS in update() and not the JT, because the JT calls 
assignTasks on every heartbeat anyway. Second, I timed update() on a simulated 
cluster with 2500 nodes, 4 slots per node, 100 jobs and 20 pools, and one call 
to update() took about 50 ms. With the new default update interval of 2500 ms, 
only 2% of the time in the JobTracker should be spent on this (and for such a 
large cluster, the update interval can be upped through the config file anyway).

> Deadlock in preemption code in fair scheduler
> ---------------------------------------------
>
>                 Key: MAPREDUCE-1436
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1436
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Blocker
>         Attachments: deadlock.png, mapreduce-1436-v2.patch, 
> mapreduce-1436.patch
>
>
> In testing the fair scheduler with preemption, I found a deadlock between 
> updatePreemptionVariables and some code in the JobTracker. This was found 
> while testing a backport of the fair scheduler to Hadoop 0.20, but it looks 
> like it could also happen in trunk and 0.21. Details are in a comment below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler

Reply via email to