[
https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113309#comment-13113309
]
Devaraj K commented on MAPREDUCE-3070:
--------------------------------------
Even if the node manager is restarted for any purpose(like cluster
maintenance), NM should wait until the
"yarn.resourcemanager.nm.liveness-monitor.expiry-interval-ms" which is 10
minutes by default to register. Decreasing the default time is also not
feasible.
Proposal is,
we can cleanup and register NM even if the registration is requested before the
expiry of NM.
> NM not able to register with RM after NM restart
> ------------------------------------------------
>
> Key: MAPREDUCE-3070
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Reporter: Ravi Teja Ch N V
> Assignee: Devaraj K
> Priority: Blocker
>
> After stopping NM gracefully then starting NM, NM registration fails with RM
> with Duplicate registration from the node! error.
> {noformat}
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager
> (NodeManager.java:main(204)) - Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
> at
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> Duplicate registration from the node!
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
> at
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> ... 2 more
> Caused by:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> Duplicate registration from the node!
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
> at $Proxy13.registerNodeManager(Unknown Source)
> at
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
> ... 3 more
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira