[
https://issues.apache.org/jira/browse/TEZ-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bikas Saha updated TEZ-3117:
----------------------------
Attachment: TEZ-3117.1.patch
> Deadlock in Edge and Vertex code
> --------------------------------
>
> Key: TEZ-3117
> URL: https://issues.apache.org/jira/browse/TEZ-3117
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Yesha Vora
> Assignee: Bikas Saha
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3117.1.patch, TEZ-3117.1.patch
>
>
> {code}
> Java-level deadlocks detected
>
> This means that some threads are blocked waiting to enter a synchronization
> block or
> waiting to reenter a synchronization block after an Object.wait() call, where
> each thread
> owns one monitor while trying to obtain another monitor already held by
> another thread.
>
> Deadlock:
> App Shared Pool - #1 is waiting to lock
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which
> is held by Dispatcher thread {Central}
> Dispatcher thread {Central} is waiting to lock
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared
> Pool - #1
>
> Deadlock:
> Dispatcher thread {Central} is waiting to lock
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared
> Pool - #1
> App Shared Pool - #1 is waiting to lock
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which
> is held by Dispatcher thread {Central}
> Thread stacks
> App Shared Pool - #1 [WAITING]
> sun.misc.Unsafe.park(native method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
>
> org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
> org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
>
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
>
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
>
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
>
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
> java.security.AccessController.doPrivileged(native method)
> javax.security.auth.Subject.doAs(Subject.java:422)
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
>
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.<null>(unknown source)
> Dispatcher thread {Central} [BLOCKED; waiting to lock
> org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
> org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
> org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
>
> org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
>
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
>
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
>
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> java.lang.Thread.<null>(unknown source)
> Frozen threads found (potential deadlock)
>
> It seems that the following threads have not changed their stack for more
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>
> client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
> org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int,
> DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int,
> DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run()
> DomainSocketWatcher.java:511
> java.lang.Thread.run() Thread.java:745
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)