[
https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198180#comment-13198180
]
Eric Payne commented on MAPREDUCE-3034:
---------------------------------------
Hi Devaraj,
Thanks for updating the patch.
It looks good to me with one exeption. The more I think about it, the more I
think we should re-read the namenode configs when the NM restarts. My reason
for this opinion is that, in this particular use case, the RM will have a new
version of the configs, and you usually want the RMs configs and the NMs
configs to match.
Other than that, I am happy with the patch. I downloaded it and tested it in
both a one-node simple cluster and in a 10-node security cluster. I restarted
the RM several times and checked the heap dump to look for memory leaks, and I
didn't see any. I also ran about 100 wordcount tests after restarting the
RM/NMs.
> NM should act on a REBOOT command from RM
> -----------------------------------------
>
> Key: MAPREDUCE-3034
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Devaraj K
> Priority: Critical
> Attachments: MAPREDUCE-3034-1.patch, MAPREDUCE-3034.patch, MR-3034.txt
>
>
> RM sends a reboot command to NM in some cases, like when it gets lost and
> rejoins back. In such a case, NM should act on the command and
> reboot/reinitalize itself.
> This is akin to TT reinitialize on order from JT. We will need to shutdown
> all the services properly and reinitialize - this should automatically take
> care of killing of containers, cleaning up local temporary files etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira