[
https://issues.apache.org/jira/browse/MAPREDUCE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727367#comment-13727367
]
Rohith Sharma K S commented on MAPREDUCE-5441:
----------------------------------------------
Jian, thank you for your brief explanation.
I am getting this problem in Linux machine. The scenario is "app master running
Node Manager is abruptly killed(kill -9 NM_PID) and restarted".
In the above test case, RM issues reboot to 1st attempt app master. At this
time, job status is set as ERROR and trigger JobHistoryEvent. Here, still
Jobclient is connecting to 1st app master for getting job report and client get
job report with jobstatus as ERROR.
> JobClient exit whenever RM issue Reboot command to 1st attempt App Master.
> --------------------------------------------------------------------------
>
> Key: MAPREDUCE-5441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5441
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, client
> Affects Versions: 2.1.1-beta
> Reporter: Rohith Sharma K S
>
> When RM issue Reboot command to app master, app master shutdown gracefully.
> All the history event are writtent to hdfs with job status set as ERROR.
> Jobclient get job state as ERROR and exit.
> But RM launches 2nd attempt app master where no client are there to get job
> status.In RM UI, job status is displayed as SUCCESS but for client Job is
> Failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira