[
https://issues.apache.org/jira/browse/MAPREDUCE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507718#comment-14507718
]
Jason Lowe commented on MAPREDUCE-6329:
---------------------------------------
bq. This would be probably because during rolling upgrade , NM was down for
some time. So Node_Removed event might have occurred either because of expiry
or reconnected event. Node removed event kills all the running containers which
has been done before container is pulled by AM.
That doesn't add up, since the container was just allocated by the node
heartbeating in. Therefore I don't see how the RM could reasonably be expiring
the node, nor should the node be unregistering. Re-registration does _not_
kill containers on the node. If it did then NM restart could not possibly
work, since the NM re-registers when it starts up.
> Failure of start map task on NM cause job hang
> ----------------------------------------------
>
> Key: MAPREDUCE-6329
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6329
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Peng Zhang
> Attachments: syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed.
> And then job hang there.
> Attach AM logs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)