[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131426#comment-13131426
 ] 

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-3070:
-----------------------------------------------------------

Hi Devaraj,
 I have raised a similiar issue 
[MAPREDUCE-3178|https://issues.apache.org/jira/browse/MAPREDUCE-3178] some time 
back. 

 I have one doubt from the patch you have attached, pls clarify

 suppose say current thread passed {noformat}if 
(this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode) != null) {noformat} 
condition, and the next moment *ping checker* thread detected that current node 
has been expired and removed node from the *running* map and called event 
handler to handle node EXPIRE event. So corresponding Schedular removes node 
from *nodes* map.

 suppose now NodeReconnectedSchedulerEvent(rmNode)) has been called. As part of 
this, again Schedular tries to remove node from the *nodes* map and and tries 
to operate on that object. Result is as we know NPE.

correct me if I am wrong. 

                
> NM not able to register with RM after NM restart
> ------------------------------------------------
>
>                 Key: MAPREDUCE-3070
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3070.patch
>
>
> After stopping NM gracefully then starting NM, NM registration fails with RM 
> with Duplicate registration from the node! error.
> {noformat} 
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager 
> (NodeManager.java:main(204)) - Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
> Duplicate registration from the node!
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
>       ... 2 more
> Caused by: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
> Duplicate registration from the node!
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
>       at $Proxy13.registerNodeManager(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
>       ... 3 more
> {noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to