[ https://issues.apache.org/jira/browse/MAPREDUCE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332531#comment-16332531 ]
Jason Lowe commented on MAPREDUCE-7020: --------------------------------------- Thanks! That is what I was originally thinking, but looking at it closer, it will fail to stop the task from running like the exit used to. In the case of the task not found problem I think we'd be OK in the uber case, since we know the AM will be feverishly trying to stop the task thread. Even if it fails to do so, the task will not be allowed to commit. However if the task hits task limit exceptions or other types of errors, the AM may not have any idea something is wrong with the task. Therefore we may want to keep the teardowns for some other cases. They are far from ideal, but at least they'll stop the job when they're supposed to. I think completely and cleanly fixing subtask errors in uber AM is going to be involved and tricky, so we may not want to try to tackle all of the uber issues in this JIRA. I would like to get this unit test fixed, so here's an updated proposal: - Ditch the IllegalStateException in the listener and replace it with an AMFeedback that has the TaskFound bit cleared. That's a more gentle and already supported way of notifying the task that it is not known. - When the TaskReporter sees that the task is unknown it can shutdown its thread in uber mode. The AM will be handling the teardown of the subtask. For the other cases where the reporter is calling exit we'd have to weigh carefully the risk of letting the task continue vs. tearing down the AM. That may be better to tackle in a followup JIRA since it could get tricky. Thoughts? > Task timeout in uber mode can crash AM > -------------------------------------- > > Key: MAPREDUCE-7020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6 > Reporter: Akira Ajisaka > Assignee: Peter Bacsko > Priority: Major > Attachments: MAPREDUCE-7020-001.patch, MAPREDUCE-7020-002.patch > > > TestUberAM is failing > {noformat} > java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/testReport/junit/org.apache.hadoop.mapreduce.v2/TestUberAM/testThreadDumpOnTaskTimeout/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org