[
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368377#comment-14368377
]
Jeff Zhang commented on TEZ-2204:
---------------------------------
It is may be an issue related to YARN-2917. Because tez has its own
AsyncDispatcher, but hasn't include of the patch of YARN-2917
Copy the jstack
{code}
"Thread-1" prio=5 tid=0x00007f9d13011800 nid=0xe507 in Object.wait()
[0x0000000117559000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007fed1c360> (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1281)
- locked <0x00000007fed1c360> (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1355)
at
org.apache.tez.common.AsyncDispatcher.serviceStop(AsyncDispatcher.java:162)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0x00000007fed61000> (a java.lang.Object)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1539)
at
org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1674)
- locked <0x00000007fed0dc50> (a org.apache.tez.dag.app.DAGAppMaster)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0x00000007fed0de80> (a java.lang.Object)
at
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run(DAGAppMaster.java:1940)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Locked ownable synchronizers:
- None
"App Shared Pool - #1" daemon prio=5 tid=0x00007f9d13e60800 nid=0xdd03 in
Object.wait() [0x000000011714c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007ff1193b8> (a
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1281)
- locked <0x00000007ff1193b8> (a
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1355)
at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <0x00000007ff111ec8> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:109)
at java.lang.System.exit(System.java:962)
at
org.apache.tez.test.TestAMRecovery$ControlledImmediateStartVertexManager.onSourceTaskCompleted(TestAMRecovery.java:601)
at
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:525)
at
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:580)
- locked <0x00000007fb82fac8> (a
org.apache.tez.dag.app.dag.impl.VertexManager)
at
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:575)
at
org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000007fbc182d8> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
> TestAMRecovery increasingly flaky on jenkins builds.
> -----------------------------------------------------
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
>
> In recent pre-commit builds and daily builds, there seem to have been some
> occurrences of TestAMRecovery failing or timing out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)