[ https://issues.apache.org/jira/browse/TEZ-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467951#comment-16467951 ]
Jonathan Eagles commented on TEZ-3932: -------------------------------------- [~vserrao], thank you for providing the test logs as I was able to create a reliable test case that reproduces this issue. I was able to create an initial patch that will remove this intermittent issue you have been facing and I will work with the community to get this checked in. This logs show that this is not just a test issue but could happen in practice during shutdown scenarios. > TaskSchedulerManager can throw NullPointerException during DAGAppMaster > container cleanup race > ---------------------------------------------------------------------------------------------- > > Key: TEZ-3932 > URL: https://issues.apache.org/jira/browse/TEZ-3932 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.10.0 > Environment: arch: x86 and ppc > java: openjdk version "1.8.0_161" > OpenJDK Runtime Environment (build 1.8.0_161-b14) > OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode) > Reporter: Valencia Edna Serrao > Assignee: Jonathan Eagles > Priority: Major > Labels: ppc, x86 > Attachments: TEZ-3932.001.patch, TEZ-3932.fail.patch, > org.apache.tez.test.TestExceptionPropagation-output.txt > > > Test > org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession > on x86 and ppc. I found related JIRA's TEZ-3746 and TEZ-3748. Though the > issue is marked as resolved in the related JIRA's, the issue exists. Below > are the error details: > {code:java} > ------------------------------------------------------------------------------- > Test set: org.apache.tez.test.TestExceptionPropagation > ------------------------------------------------------------------------------- > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 96.433 sec > <<< FAILURE! > testExceptionPropagationSession(org.apache.tez.test.TestExceptionPropagation) > Time elapsed: 52.7 sec <<< ERROR! > org.apache.tez.dag.api.SessionNotRunning: Application not running, > applicationId=application_1525667420557_0001, yarnApplicationState=FAILED, > finalApplicationStatus=FAILED, trackingUrl=N/A, diagnostics=[DAG completed > with an ERROR state. Shutting down AM, Session stats:submittedDAGs=11, > successfulDAGs=0, failedDAGs=12, killedDAGs=0] > at > org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) > at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1024) > at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:1034) > at > org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:652) > at org.apache.tez.client.TezClient.submitDAG(TezClient.java:588) > at > org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession(TestExceptionPropagation.java:227 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)