[
https://issues.apache.org/jira/browse/MAPREDUCE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726954#comment-13726954
]
Jian He commented on MAPREDUCE-5441:
------------------------------------
Debugged a while. if you are running on MAC, you probably will get this problem
(see YARN-76). If you are running on Linux, you should not get problem. The
reason is:
After RM restarts, before Reboot command is actually processed by AM, AM will
get AMRMToken invalid Exception, since now AMRMToken is used in non-secure
mode. What MR AM now handles this exception is just ignoring it and keeping
retry, essentially an infinite loop.
On linux, AM process will be quickly killed by NM sending the signal, RM
launches a new AM, during this time JobClient will retry and eventually
switched to the new AM; But on MAC, AM process is probably still hanging
around. This leads to the JobClient keeps talking with the old AM, the old AM
will eventually tell the Client that the job failed. Tested this on real
cluster and see that JobClient will hang a while and eventually continues
reporting job progress.
> JobClient exit whenever RM issue Reboot command to 1st attempt App Master.
> --------------------------------------------------------------------------
>
> Key: MAPREDUCE-5441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5441
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, client
> Affects Versions: 2.1.1-beta
> Reporter: Rohith Sharma K S
>
> When RM issue Reboot command to app master, app master shutdown gracefully.
> All the history event are writtent to hdfs with job status set as ERROR.
> Jobclient get job state as ERROR and exit.
> But RM launches 2nd attempt app master where no client are there to get job
> status.In RM UI, job status is displayed as SUCCESS but for client Job is
> Failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira