[jira] [Updated] (MAPREDUCE-5466) Historyserver does not refresh the result of restarted jobs after RM restart

Jian He (JIRA) Mon, 19 Aug 2013 19:27:00 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jian He updated MAPREDUCE-5466:
-------------------------------

    Attachment: MAPREDUCE-5466.1.patch

New patch fixed that in case of both REBOOT and ERROR event skip writing 
history files except for the last AM attempt.

Did manual single node cluster test. Reproduce this problem by putting a sleep 
inside MRAppMaster.shutDownJob() before calling MRAppMaster.this.stop(); so 
that after RM restarts, the JobUnsuccessfulCompletionEvent generated in 
InternalRebootTransition has a chance to be processed by JobHistoryEventHandler 
before MR actually exits. This test passed with patch and failed without.
                
> Historyserver does not refresh the result of restarted jobs after RM restart
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5466
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5466
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: yeshavora
>            Assignee: Jian He
>         Attachments: MAPREDUCE-5466.1.patch, MAPREDUCE-5466.patch
>
>
> Restart RM when sort job is running and verify that the job passes 
> successfully after RM restarts. 
> Once the job finishes successfully, run job status command for sort job. It 
> shows "Job state =FAILED". Job history server does not update the result for 
> the job which restarted after RM restart.
> hadoop job -status job_1375923346354_0003
> 13/08/08 01:24:13 INFO mapred.ClientServiceDelegate: Application state is 
> completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> Job: job_1375923346354_0003
> Job File: 
> hdfs://host1:port1/history/done/2013/08/08/000000/job_1375923346354_0003_conf.xml
> Job Tracking URL : 
> http://historyserver:port2/jobhistory/job/job_1375923346354_0003
> Uber job : false
> Number of maps: 80
> Number of reduces: 1
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters not available. Job is retired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5466) Historyserver does not refresh the result of restarted jobs after RM restart

Reply via email to