[
https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194862#comment-13194862
]
Eric Payne commented on MAPREDUCE-3034:
---------------------------------------
@Devaraj
That's fine if you want to take it over. When do you think you can get a patch
up? I was hoping to get this going within the next week.
>From my point of view, the basic requirement is to be able to bounce the RM
>without having to manually star every single NM in a very large cluster
>(thousands of NMs).
Right now, when NM gets the reboot command from the RM, it just calls the stop
hooks, just like if it gets a shutdown command. My plan is that if NM gets
reboot command, it still executes the shutdown hook, but then add a reboot hook
that executes the same basic code as was done to begin with in NameNode.main().
Is that your basic plan?
I have already written up a "proof-of-concept" patch and tested it in a 10-node
secure cluster. To test it, I shutdown RM and restarted it. After the restart,
I ran an hour's worth of jobs and compared the time and heap size from before
and after. They all looked good to me.
Thanks,
-Eric
> NM should act on a REBOOT command from RM
> -----------------------------------------
>
> Key: MAPREDUCE-3034
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Devaraj K
> Attachments: MR-3034.txt
>
>
> RM sends a reboot command to NM in some cases, like when it gets lost and
> rejoins back. In such a case, NM should act on the command and
> reboot/reinitalize itself.
> This is akin to TT reinitialize on order from JT. We will need to shutdown
> all the services properly and reinitialize - this should automatically take
> care of killing of containers, cleaning up local temporary files etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira