[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332531#comment-16332531
 ] 

Jason Lowe commented on MAPREDUCE-7020:
---------------------------------------

Thanks! That is what I was originally thinking, but looking at it closer, it 
will fail to stop the task from running like the exit used to.  In the case of 
the task not found problem I think we'd be OK in the uber case, since we know 
the AM will be feverishly trying to stop the task thread. Even if it fails to 
do so, the task will not be allowed to commit. However if the task hits task 
limit exceptions or other types of errors, the AM may not have any idea 
something is wrong with the task. Therefore we may want to keep the teardowns 
for some other cases. They are far from ideal, but at least they'll stop the 
job when they're supposed to.

I think completely and cleanly fixing subtask errors in uber AM is going to be 
involved and tricky, so we may not want to try to tackle all of the uber issues 
in this JIRA. I would like to get this unit test fixed, so here's an updated 
proposal:
 - Ditch the IllegalStateException in the listener and replace it with an 
AMFeedback that has the TaskFound bit cleared. That's a more gentle and already 
supported way of notifying the task that it is not known.
 - When the TaskReporter sees that the task is unknown it can shutdown its 
thread in uber mode. The AM will be handling the teardown of the subtask.

For the other cases where the reporter is calling exit we'd have to weigh 
carefully the risk of letting the task continue vs. tearing down the AM. That 
may be better to tackle in a followup JIRA since it could get tricky.  Thoughts?

> Task timeout in uber mode can crash AM
> --------------------------------------
>
>                 Key: MAPREDUCE-7020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7020
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Akira Ajisaka
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7020-001.patch, MAPREDUCE-7020-002.patch
>
>
> TestUberAM is failing
> {noformat}
> java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/testReport/junit/org.apache.hadoop.mapreduce.v2/TestUberAM/testThreadDumpOnTaskTimeout/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to