[
https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131426#comment-13131426
]
Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-3070:
-----------------------------------------------------------
Hi Devaraj,
I have raised a similiar issue
[MAPREDUCE-3178|https://issues.apache.org/jira/browse/MAPREDUCE-3178] some time
back.
I have one doubt from the patch you have attached, pls clarify
suppose say current thread passed {noformat}if
(this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode) != null) {noformat}
condition, and the next moment *ping checker* thread detected that current node
has been expired and removed node from the *running* map and called event
handler to handle node EXPIRE event. So corresponding Schedular removes node
from *nodes* map.
suppose now NodeReconnectedSchedulerEvent(rmNode)) has been called. As part of
this, again Schedular tries to remove node from the *nodes* map and and tries
to operate on that object. Result is as we know NPE.
correct me if I am wrong.
> NM not able to register with RM after NM restart
> ------------------------------------------------
>
> Key: MAPREDUCE-3070
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.0
> Reporter: Ravi Teja Ch N V
> Assignee: Devaraj K
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3070.patch
>
>
> After stopping NM gracefully then starting NM, NM registration fails with RM
> with Duplicate registration from the node! error.
> {noformat}
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager
> (NodeManager.java:main(204)) - Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
> at
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> Duplicate registration from the node!
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
> at
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> ... 2 more
> Caused by:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> Duplicate registration from the node!
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
> at $Proxy13.registerNodeManager(Unknown Source)
> at
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
> ... 3 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira