[
https://issues.apache.org/jira/browse/MAPREDUCE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888185#comment-13888185
]
Mit Desai commented on MAPREDUCE-5688:
--------------------------------------
Thanks Jon for the Review!
bq. Re-title this jira since this is not a test problem according to the patch,
but a race condition in the MRAppMaster that is exposed most frequently via
this test.
I have renamed this JIRA accordingly. I would like to make one clarification
here. I do not see this problem as a race. The bug is exposed by this test
running with JDK7 and its random ordering.
bq. Add your analysis to the jira so that the actual problem is documented and
captured for future use.
This failure is intermittent. It is only caused when the test
TestStagingCleanup runs in a particular order. For example,
testDeletionofStagingOnReboot() followed by testDeletionofStagingOnKillLastTry()
The reason for the failure is due to the notifyIsLastAMRetry(). When this
function is called, it calls setForcejobCompletion(). If the appMaster.stop()
is called after the setForcejobCompletion(), it tries to stop the appMaster
which was already forced to stop. As a result, it gets an NPE trying to stop
the appMaster. If the appMaster.stop() is called in the first place, we won't
get the NPE when it tries forceJobCompletion as there already is a null check
before it proceeds.
hook.run() is also called in testDeletionofStagingOnKill(). But we do not get
the NPE in that case. The reason for this is, in this test, we have 4 app
attempts. _MRAppMaster appMaster = new TestMRApp(attemptId, mockAlloc, 4);_
where as in testDeletionofStagingOnKillLastTry() we have only 1 attempt to make
sure there is no retry. _MRAppMaster appMaster = new TestMRApp(attemptId,
mockAlloc, 1); //no retry_
bq. Please determine if the java7 label is still accurate based on your analysis
We still need the java7 label as the TestStagingCleanup will not always fail
without this fix. It only fails when the tests run in a particular order.
> MRAppMaster causes TestStagingCleanup to fail intermittently with JDK7
> ----------------------------------------------------------------------
>
> Key: MAPREDUCE-5688
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5688
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.3.0
> Reporter: Mit Desai
> Assignee: Mit Desai
> Labels: java7
> Attachments: MAPREDUCE-5688.patch
>
>
> Due to random ordering ordering in JDK7, the test
> TestStagingCleanup#testDeletionofStagingOnKillLastTry is failing
> {noformat}
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.231 sec <<<
> FAILURE!
> test(org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup) Time elapsed:
> 3882 sec <<< ERROR!
> java.lang.NullPointerException
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:349)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
> at
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1399)
> at
> org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup.testDeletionofStagingOnKillLastTry(TestStagingCleanup.java:239)
> at
> org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup.test(TestStagingCleanup.java:82)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:242)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:137)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)