[ 
https://issues.apache.org/jira/browse/TEZ-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-3117:
----------------------------
    Attachment: TEZ-3117.1.patch

> Deadlock in Edge and Vertex code
> --------------------------------
>
>                 Key: TEZ-3117
>                 URL: https://issues.apache.org/jira/browse/TEZ-3117
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Bikas Saha
>             Fix For: 0.7.1, 0.8.3
>
>         Attachments: TEZ-3117.1.patch, TEZ-3117.1.patch
>
>
> {code}
> Java-level deadlocks detected
>  
> This means that some threads are blocked waiting to enter a synchronization 
> block or
> waiting to reenter a synchronization block after an Object.wait() call, where 
> each thread
> owns one monitor while trying to obtain another monitor already held by 
> another thread.
>  
> Deadlock:
> App Shared Pool - #1 is waiting to lock 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which 
> is held by Dispatcher thread {Central}
> Dispatcher thread {Central} is waiting to lock 
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared 
> Pool - #1
>  
> Deadlock:
> Dispatcher thread {Central} is waiting to lock 
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared 
> Pool - #1
> App Shared Pool - #1 is waiting to lock 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which 
> is held by Dispatcher thread {Central}
> Thread stacks
> App Shared Pool - #1 [WAITING]
>  sun.misc.Unsafe.park(native method)
>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>  
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
>  
> org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
>  org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
>  
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
>  
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
>  
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
>  
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
>  java.security.AccessController.doPrivileged(native method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
>  
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
>  java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.<null>(unknown source)
> Dispatcher thread {Central} [BLOCKED; waiting to lock 
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
>  org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
>  
> org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
>  
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
>  
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
>  
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
>  org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>  java.lang.Thread.<null>(unknown source)
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
> org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, 
> DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, 
> DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() 
> DomainSocketWatcher.java:511
> java.lang.Thread.run() Thread.java:745
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to