[jira] Updated: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Hairong Kuang (JIRA) Tue, 18 Nov 2008 17:10:37 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hairong Kuang updated HADOOP-4659:
----------------------------------

    Attachment: rpcConn.patch

This patch checks the cause of the failure when setting up a RPC client tries 
to connect to a RPC server. It retries if it is caused by an unavailable or 
busy server.

It adds a new static method waitForProxy with a timeout mainly for the purpose 
of testing. A unit test is added to TestRPC to makes sure that client retries. 
A manual test is also conducted that starting a DataNode without starting 
NameNode causes DataNode to retry.

Steve, could you please review and test the patch in your setup? I appreciate 
any of your feedback.

> Root cause of connection failure is being lost to code that uses it for 
> delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.18.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: connectRetry.patch, hadoop-4659.patch, rpcConn.patch
>
>
> ipc.Client the root cause of a connection failure is being lost as the 
> exception is wrapped, hence the outside code, the one that looks for that 
> root cause, isn't working as expected. The results is you can't bring up a 
> task tracker before job tracker, and probably the same for a datanode before 
> a  namenode. The change that triggered this is not yet located, I had thought 
> it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Reply via email to