[ https://issues.apache.org/jira/browse/TEZ-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated TEZ-3117: ---------------------------- Attachment: TEZ-3117.1.patch > Deadlock in Edge and Vertex code > -------------------------------- > > Key: TEZ-3117 > URL: https://issues.apache.org/jira/browse/TEZ-3117 > Project: Apache Tez > Issue Type: Bug > Reporter: Yesha Vora > Assignee: Bikas Saha > Fix For: 0.7.1, 0.8.3 > > Attachments: TEZ-3117.1.patch, TEZ-3117.1.patch > > > {code} > Java-level deadlocks detected > > This means that some threads are blocked waiting to enter a synchronization > block or > waiting to reenter a synchronization block after an Object.wait() call, where > each thread > owns one monitor while trying to obtain another monitor already held by > another thread. > > Deadlock: > App Shared Pool - #1 is waiting to lock > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which > is held by Dispatcher thread {Central} > Dispatcher thread {Central} is waiting to lock > org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared > Pool - #1 > > Deadlock: > Dispatcher thread {Central} is waiting to lock > org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared > Pool - #1 > App Shared Pool - #1 is waiting to lock > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which > is held by Dispatcher thread {Central} > Thread stacks > App Shared Pool - #1 [WAITING] > sun.misc.Unsafe.park(native method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > > org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098) > > org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99) > org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214) > > org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447) > > org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453) > > org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216) > > org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275) > > org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196) > > org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146) > > org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642) > java.security.AccessController.doPrivileged(native method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642) > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631) > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.<null>(unknown source) > Dispatcher thread {Central} [BLOCKED; waiting to lock > org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db] > org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241) > > org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886) > > org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020) > org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055) > org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204) > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007) > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996) > > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59) > org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799) > org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203) > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214) > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200) > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) > java.lang.Thread.<null>(unknown source) > Frozen threads found (potential deadlock) > > It seems that the following threads have not changed their stack for more > than 10 seconds. > These threads are possibly (but not necessarily!) in a deadlock or hung. > > client DomainSocketWatcher <--- Frozen for at least 20m 33 sec > org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, > DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native) > org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, > DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52 > org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() > DomainSocketWatcher.java:511 > java.lang.Thread.run() Thread.java:745 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)