[jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Steve Loughran (JIRA) Fri, 14 Nov 2008 06:53:37 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647621#action_12647621
 ]


Steve Loughran commented on HADOOP-4659:
----------------------------------------

full stack trace. 

Termination Record: HOST 
morzine.hpl.hp.com:rootProcess:testOrphanTracker:action:taskTracker, type: 
abnormal, description: Service has halted (this termination was not expected)
java.io.IOException: Call to localhost/127.0.0.1:8012 failed on local 
exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.call(Client.java:699)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.mapred.$Proxy7.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:453)
at org.apache.hadoop.mapred.TaskTracker.innerStart(TaskTracker.java:831)
at org.apache.hadoop.util.Service.start(Service.java:186)
at 
org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.innerDeploy(HadoopServiceImpl.java:480)
at 
org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.access$000(HadoopServiceImpl.java:47)
at 
org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl$ServiceDeployerThread.execute(HadoopServiceImpl.java:630)
at org.smartfrog.sfcore.utils.SmartFrogThread.run(SmartFrogThread.java:279)
at org.smartfrog.sfcore.utils.WorkflowThread.run(WorkflowThread.java:117)

//and the nested exception

Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:771)
at org.apache.hadoop.ipc.Client.call(Client.java:685)

> Root cause of connection failure is being lost to code that uses it for 
> delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> ipc.Client the root cause of a connection failure is being lost as the 
> exception is wrapped, hence the outside code, the one that looks for that 
> root cause, isn't working as expected. The results is you can't bring up a 
> task tracker before job tracker, and probably the same for a datanode before 
> a  namenode. The change that triggered this is not yet located, I had thought 
> it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Reply via email to