Bikas Saha created TEZ-3117:
-------------------------------
Summary: Deadlock in Edge and Vertex code
Key: TEZ-3117
URL: https://issues.apache.org/jira/browse/TEZ-3117
Project: Apache Tez
Issue Type: Bug
Reporter: Yesha Vora
Assignee: Bikas Saha
{code}
Java-level deadlocks detected
This means that some threads are blocked waiting to enter a synchronization
block or
waiting to reenter a synchronization block after an Object.wait() call, where
each thread
owns one monitor while trying to obtain another monitor already held by another
thread.
Deadlock:
App Shared Pool - #1 is waiting to lock
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is
held by Dispatcher thread {Central}
Dispatcher thread {Central} is waiting to lock
org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool
- #1
Deadlock:
Dispatcher thread {Central} is waiting to lock
org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool
- #1
App Shared Pool - #1 is waiting to lock
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is
held by Dispatcher thread {Central}
Thread stacks
App Shared Pool - #1 [WAITING]
sun.misc.Unsafe.park(native method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
java.security.AccessController.doPrivileged(native method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.<null>(unknown source)
Dispatcher thread {Central} [BLOCKED; waiting to lock
org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
java.lang.Thread.<null>(unknown source)
Frozen threads found (potential deadlock)
It seems that the following threads have not changed their stack for more than
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int,
DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int,
DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run()
DomainSocketWatcher.java:511
java.lang.Thread.run() Thread.java:745
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)