[ 
https://issues.apache.org/jira/browse/HADOOP-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reassigned HADOOP-4977:
----------------------------------

    Assignee: Amar Kamat

> Deadlock between reclaimCapacity and assignTasks
> ------------------------------------------------
>
>                 Key: HADOOP-4977
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4977
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: jstack.txt
>
>
> I was running the latest trunk with the capacity scheduler and saw the 
> JobTracker lock up with the following deadlock reported in jstack:
> Found one Java-level deadlock:
> =============================
> "18107...@qtp0-4":
>   waiting to lock monitor 0x08085b40 (object 0x56605100, a 
> org.apache.hadoop.mapred.JobTracker),
>   which is held by "IPC Server handler 4 on 54311"
> "IPC Server handler 4 on 54311":
>   waiting to lock monitor 0x0808594c (object 0x5660e518, a 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr),
>   which is held by "reclaimCapacity"
> "reclaimCapacity":
>   waiting to lock monitor 0x08085b40 (object 0x56605100, a 
> org.apache.hadoop.mapred.JobTracker),
>   which is held by "IPC Server handler 4 on 54311"
> Java stack information for the threads listed above:
> ===================================================
> "18107...@qtp0-4":
>       at 
> org.apache.hadoop.mapred.JobTracker.getClusterStatus(JobTracker.java:2695)
>       - waiting to lock <0x56605100> (a org.apache.hadoop.mapred.JobTracker)
>       at 
> org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:93)
>       at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>       at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
>       at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
>       at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>       at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>       at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>       at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>       at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>       at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>       at org.mortbay.jetty.Server.handle(Server.java:324)
>       at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>       at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>       at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
>       at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
>       at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>       at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>       at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> "IPC Server handler 4 on 54311":
>       at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.updateQSIObjects(CapacityTaskScheduler.java:564)
>       - waiting to lock <0x5660e518> (a 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr)
>       at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:855)
>       at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$1000(CapacityTaskScheduler.java:294)
>       at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1336)
>       - locked <0x5660dd20> (a org.apache.hadoop.mapred.CapacityTaskScheduler)
>       at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2288)
>       - locked <0x56605100> (a org.apache.hadoop.mapred.JobTracker)
>       at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> Unfortunately I didn't manage to select all of the output by mistake, so some 
> is missing, but it appears that reclaimCapacity locks the MapSchedulingMgr 
> and then tries to lock the JobTracker, whereas the updateQSIObjects called in 
> assignTasks holds a lock on the JobTracker (the JobTracker grabs this lock 
> when it calls assignTasks) and then tries to lock the MapSchedulingMgr. The 
> other thread listed there is a Jetty thread for the web interface and isn't 
> part of the circular locking. The solution to this would be to lock the 
> JobTracker in reclaimCapacity before locking anything else.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to