[
https://issues.apache.org/jira/browse/MAPREDUCE-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357235#comment-16357235
]
Jason Lowe commented on MAPREDUCE-7048:
---------------------------------------
Thanks for updating the patch!
Now that we're looking up uberized all the time, I think it makes sense to just
do this once when the task is configured (i.e.: make it a field that is
initialized in the setConf method). Then we don't have to do conf key lookups
every time we do a status update.
Rather than mess with the security manager it would be simpler to change the
System.exit calls to use ExitUtil.terminate. Task is already doing this in
another place already, and arguably it should be consistent. Then the test for
non-uber mode can be just as simple as the uber test by making sure
ExitUtil.systemExitDisabled is called and adding
{{expected=ExitException.class}} to the Test annotation.
> AM can still crash after MAPREDUCE-7020
> ---------------------------------------
>
> Key: MAPREDUCE-7048
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7048
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am
> Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
> Attachments: MAPREDUCE-7048-001.patch, MAPREDUCE-7048-002.patch
>
>
> The testcase TestUberAM#testThreadDumpOnTaskTimeout was supposed to be fixed
> by MAPREDUCE-7020. However, it still fails, see:
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7325/testReport/junit/org.apache.hadoop.mapreduce.v2/TestMRJobs/testThreadDumpOnTaskTimeout/
> (note: other tests failed as well, but those look unrelated).
> When I tried to reproduce it locally, it failed again, although with a
> slightly different error message (it was actually the same as before):
> {noformat}
> [INFO] -------------------------------------------------------
> [INFO] T E S T S
> [INFO] -------------------------------------------------------
> [INFO] Running org.apache.hadoop.mapreduce.v2.TestUberAM
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> 128.192 s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestUberAM
> [ERROR]
> testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestUberAM) Time
> elapsed: 79.539 s <<< FAILURE!
> java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> *Root cause:* {{System.exit()}} is still invoked at {{Task.statusUpdate()}}
> {noformat}
> public void statusUpdate(TaskUmbilicalProtocol umbilical)
> throws IOException {
> int retries = MAX_RETRIES;
> while (true) {
> try {
> if (!umbilical.statusUpdate(getTaskID(), taskStatus).getTaskFound()) {
> LOG.warn("Parent died. Exiting "+taskId);
> System.exit(66);
> }
> taskStatus.clearStatus();
> return;
> ...
> {noformat}
> At this point, the task was not found and return value of
> {{umbilical.statusUpdate()}} is false. Checking whether we run in uber mode
> seems to solve the problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]