[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

B Anil Kumar updated MAPREDUCE-3676:
------------------------------------

    Attachment: MAPREDUCE-3676.patch

Currently, If NM starts before RM, then NM gets shutdown due to registration 
failure.  But I *think*, an NM can start before RM, so instead of lettting NM 
down, it would be better to retry RM, till it is up.
 
The same has been provided as patch.  Please let me know, whether this approach 
is OK or not.
 
And moreover second case, retrying RM continuously, after RM and NM started and 
RM's unavailabilty, seems
  correct. 
                
> Nodemanager if started before starting Resource manager is getting 
> shutdown.But if both RM and NM are started and then after if RM is going 
> down,NM is retrying for the RM.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3676
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3676
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramgopal N
>         Attachments: MAPREDUCE-3676.patch
>
>
> If NM is started before starting the RM ,NM is shutting down with the 
> following error
> {code}
> ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
> services org.apache.hadoop.yarn.server.nodemanager.NodeManager
> org.apache.avro.AvroRuntimeException: 
> java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
> Caused by: java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
>       ... 3 more
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
> Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
>       at $Proxy23.registerNodeManager(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
>       ... 5 more
> Caused by: java.net.ConnectException: Call From 
> HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1141)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1100)
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
>       ... 7 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
>       at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1117)
>       ... 9 more
> 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:Dispatcher is stopped.
> 2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped 
> [email protected]:9999
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 24290
> 2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 24290
> 2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
>  is stopped.
> 2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to