[
https://issues.apache.org/jira/browse/MAPREDUCE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jian He updated MAPREDUCE-5466:
-------------------------------
Attachment: MAPREDUCE-5466.1.patch
New patch fixed that in case of both REBOOT and ERROR event skip writing
history files except for the last AM attempt.
Did manual single node cluster test. Reproduce this problem by putting a sleep
inside MRAppMaster.shutDownJob() before calling MRAppMaster.this.stop(); so
that after RM restarts, the JobUnsuccessfulCompletionEvent generated in
InternalRebootTransition has a chance to be processed by JobHistoryEventHandler
before MR actually exits. This test passed with patch and failed without.
> Historyserver does not refresh the result of restarted jobs after RM restart
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-5466
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5466
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: yeshavora
> Assignee: Jian He
> Attachments: MAPREDUCE-5466.1.patch, MAPREDUCE-5466.patch
>
>
> Restart RM when sort job is running and verify that the job passes
> successfully after RM restarts.
> Once the job finishes successfully, run job status command for sort job. It
> shows "Job state =FAILED". Job history server does not update the result for
> the job which restarted after RM restart.
> hadoop job -status job_1375923346354_0003
> 13/08/08 01:24:13 INFO mapred.ClientServiceDelegate: Application state is
> completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> Job: job_1375923346354_0003
> Job File:
> hdfs://host1:port1/history/done/2013/08/08/000000/job_1375923346354_0003_conf.xml
> Job Tracking URL :
> http://historyserver:port2/jobhistory/job/job_1375923346354_0003
> Uber job : false
> Number of maps: 80
> Number of reduces: 1
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due
> to some other reason and reason can be found in the logs.
> Counters not available. Job is retired.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira