[ https://issues.apache.org/jira/browse/MAPREDUCE-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated MAPREDUCE-7048: ------------------------------------ Attachment: MAPREDUCE-7048-branch-2.9.01.patch MAPREDUCE-7048-branch-2.8.01.patch MAPREDUCE-7048-branch-2.7.01.patch MAPREDUCE-7048-branch-2.01.patch > AM can still crash after MAPREDUCE-7020 > --------------------------------------- > > Key: MAPREDUCE-7048 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7048 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am > Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6 > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Attachments: MAPREDUCE-7048-001.patch, MAPREDUCE-7048-002.patch, > MAPREDUCE-7048-003.patch, MAPREDUCE-7048-branch-2.01.patch, > MAPREDUCE-7048-branch-2.7.01.patch, MAPREDUCE-7048-branch-2.8.01.patch, > MAPREDUCE-7048-branch-2.9.01.patch > > > The testcase TestUberAM#testThreadDumpOnTaskTimeout was supposed to be fixed > by MAPREDUCE-7020. However, it still fails, see: > https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7325/testReport/junit/org.apache.hadoop.mapreduce.v2/TestMRJobs/testThreadDumpOnTaskTimeout/ > (note: other tests failed as well, but those look unrelated). > When I tried to reproduce it locally, it failed again, although with a > slightly different error message (it was actually the same as before): > {noformat} > [INFO] ------------------------------------------------------- > [INFO] T E S T S > [INFO] ------------------------------------------------------- > [INFO] Running org.apache.hadoop.mapreduce.v2.TestUberAM > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 128.192 s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestUberAM > [ERROR] > testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestUberAM) Time > elapsed: 79.539 s <<< FAILURE! > java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} > *Root cause:* {{System.exit()}} is still invoked at {{Task.statusUpdate()}} > {noformat} > public void statusUpdate(TaskUmbilicalProtocol umbilical) > throws IOException { > int retries = MAX_RETRIES; > while (true) { > try { > if (!umbilical.statusUpdate(getTaskID(), taskStatus).getTaskFound()) { > LOG.warn("Parent died. Exiting "+taskId); > System.exit(66); > } > taskStatus.clearStatus(); > return; > ... > {noformat} > At this point, the task was not found and return value of > {{umbilical.statusUpdate()}} is false. Checking whether we run in uber mode > seems to solve the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org