[
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529363#comment-13529363
]
Xuan Gong commented on MAPREDUCE-4870:
--------------------------------------
The code looks good for me. It definitely can get out of the infinite loop. I
checked the RMAppImpl, it does not contain the transition from status failure
to status finish. So, this line Assert.assertEquals(RMAppState.FINISHED,
mrCluster.getResourceManager().getRMContext().getRMApps().get(appID).getState())
will always get wrong in this case.
Looks like that the failure is because we can not launch the container :
2012-12-11 12:02:14,938 INFO [ContainersLauncher #0]
nodemanager.DefaultContainerExecutor
(DefaultContainerExecutor.java:launchContainer(175)) - launchContainer: [bash,
/Users/xgong/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService-localDir-nm-0_0/usercache/root/appcache/application_1355256124849_0001/container_1355256124849_0001_01_000001/default_container_executor.sh]
It will returen non-zero exit code 127.
Then it will cause the following AM and application failure
> TestMRJobsWithHistoryService causes infinite loop if it fails
> -------------------------------------------------------------
>
> Key: MAPREDUCE-4870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0, trunk-win
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and
> sleep after job execution, checking for the application state to reach
> {{RMAppState#FINISHED}}. If the job fails, then the application could be in
> a different terminal state, and this polling loop will never terminate.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira