This sure is a bug on the NM side which doesn't handle the REBOOT command from the RM.
But can you upload the RM side logs related to this node so that we are sure there aren't any bugs in RM? Thanks! On Mon, Sep 19, 2011 at 3:46 PM, Devaraj K (JIRA) <[email protected]> wrote: > RM is not processing heartbeat and continuously giving the message 'Node > not found rebooting' > > --------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-3030 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3030 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager > Affects Versions: 0.24.0 > Reporter: Devaraj K > Assignee: Devaraj K > Priority: Blocker > > > {code:title=Node Manager Logs|borderStyle=solid} > 2011-09-19 13:39:29,816 INFO webapp.WebApps (WebApps.java:start(162)) - > Registered webapp guice modules > 2011-09-19 13:39:29,817 INFO service.AbstractService > (AbstractService.java:start(61)) - > Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is > started. > 2011-09-19 13:39:29,818 INFO service.AbstractService > (AbstractService.java:start(61)) - Service:Dispatcher is started. > 2011-09-19 13:39:29,819 INFO nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:start(133)) - Configured ContainerManager > Address is 10.18.52.124:45454 > 2011-09-19 13:39:29,819 INFO ipc.YarnRPC (YarnRPC.java:create(47)) - > Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC > 2011-09-19 13:39:29,822 INFO ipc.HadoopYarnRPC > (HadoopYarnProtoRPC.java:getProxy(49)) - Creating a HadoopYarnProtoRpc proxy > for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker > 2011-09-19 13:39:29,862 INFO nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:registerWithRM(165)) - Connected to > ResourceManager at 0.0.0.0:8025 > 2011-09-19 13:39:30,369 INFO nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:registerWithRM(189)) - Registered with > ResourceManager as 10.18.52.124:45454 with total resource of memory: 8192, > 2011-09-19 13:39:30,369 INFO service.AbstractService > (AbstractService.java:start(61)) - > Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is > started. > 2011-09-19 13:39:30,371 INFO service.AbstractService > (AbstractService.java:start(61)) - > Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. > {code} > > > > {code:title=Resource Manager Logs|borderStyle=solid} > 2011-09-19 14:01:03,238 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > Call: > protocol=org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$BlockingInterface, > method=nodeHeartbeat > 2011-09-19 14:01:04,240 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > Call: > protocol=org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$BlockingInterface, > method=nodeHeartbeat > 2011-09-19 14:01:05,242 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > Call: > protocol=org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$BlockingInterface, > method=nodeHeartbeat > 2011-09-19 14:01:06,244 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > Call: > protocol=org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$BlockingInterface, > method=nodeHeartbeat > 2011-09-19 14:01:07,246 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > Call: > protocol=org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$BlockingInterface, > method=nodeHeartbeat > 2011-09-19 14:01:08,247 INFO resourcemanager.ResourceTrackerService > (ResourceTrackerService.java:nodeHeartbeat(201)) - Node not found rebooting > 10.18.52.124:45454 > {code} > > Node Manager is registered with Resource manager and the for every > heartbeat, it is printing the above message. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
