[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529363#comment-13529363
 ] 

Xuan Gong commented on MAPREDUCE-4870:
--------------------------------------

The code looks good for me. It definitely can get out of the infinite loop. I 
checked the RMAppImpl, it does not contain the transition from status failure 
to status finish. So, this line Assert.assertEquals(RMAppState.FINISHED, 
mrCluster.getResourceManager().getRMContext().getRMApps().get(appID).getState())
 will always get wrong in this case. 
Looks like that the failure is because we can not launch the container :
2012-12-11 12:02:14,938 INFO  [ContainersLauncher #0] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(175)) - launchContainer: [bash, 
/Users/xgong/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService-localDir-nm-0_0/usercache/root/appcache/application_1355256124849_0001/container_1355256124849_0001_01_000001/default_container_executor.sh]
It will returen non-zero exit code 127. 
Then it will cause the following AM and application failure 
                
> TestMRJobsWithHistoryService causes infinite loop if it fails
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-4870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0, trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
> sleep after job execution, checking for the application state to reach 
> {{RMAppState#FINISHED}}.  If the job fails, then the application could be in 
> a different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to