[ 
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650567#action_12650567
 ] 

Steve Loughran commented on HADOOP-4659:
----------------------------------------

I'm going to push out my updated lifecycle patches shortly. One test I have 
there brings up a tasktracker without the rest of the infrastructure (DFS, 
jobtracker); it is now hanging until the test times out, spinning while things 
get set up, waiting for a job tracker that never arrives.


    [junit] Tue Nov 25 13:50:13 2008
    [junit] BEA JRockit(R) R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64
    [junit] "Main Thread" id=1 idx=0x4 tid=4074 prio=5 alive, in native, 
sleeping, native_waiting
    [junit]     at java/lang/Thread.sleep(J)V(Native Method)
    [junit]     at 
org/apache/hadoop/ipc/Client$Connection.handleConnectionFailure(Client.java:364)
    [junit]     at 
org/apache/hadoop/ipc/Client$Connection.setupIOstreams(Client.java:310)
    [junit]     ^-- Holding lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock]
    [junit]     at 
org/apache/hadoop/ipc/Client$Connection.access$1800(Client.java:177)
    [junit]     at org/apache/hadoop/ipc/Client.getConnection(Client.java:792)
    [junit]     at org/apache/hadoop/ipc/Client.call(Client.java:688)
    [junit]     at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:215)
    [junit]     at 
org/apache/hadoop/mapred/$Proxy0.getProtocolVersion(Ljava/lang/String;J)J(Unknown
 Source)
    [junit]     at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:347)
    [junit]     at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:334)
    [junit]     at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:371)
    [junit]     at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:308)
    [junit]     at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:285)
    [junit]     at 
org/apache/hadoop/mapred/TaskTracker.initialize(TaskTracker.java:454)
    [junit]     ^-- Holding lock: org/apache/hadoop/mapred/[EMAIL PROTECTED]
    [junit]     at 
org/apache/hadoop/mapred/TaskTracker.innerStart(TaskTracker.java:830)
    [junit]     ^-- Holding lock: org/apache/hadoop/mapred/[EMAIL PROTECTED] 
lock]
    [junit]     at org/apache/hadoop/util/Service.start(Service.java:186)
    [junit]     at org/apache/hadoop/util/Service.deploy(Service.java:654)
    [junit]     at 
org/apache/hadoop/mapred/TaskTracker.<init>(TaskTracker.java:965)
    [junit]     at 
org/apache/hadoop/mapred/TaskTracker.<init>(TaskTracker.java:948)

What I propose here is to move TaskTracker to have a timeout on its 
waitForProxy() operation, so that if the TT comes up before the JT, there's a 
bit of leeway, but eventually the TT will conclude that it is an orphan and 
that it cannot start up

> Root cause of connection failure is being lost to code that uses it for 
> delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.18.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: connectRetry.patch, hadoop-4659.patch, 
> hadoop-4659.patch, rpcConn.patch, rpcConn1.patch
>
>
> ipc.Client the root cause of a connection failure is being lost as the 
> exception is wrapped, hence the outside code, the one that looks for that 
> root cause, isn't working as expected. The results is you can't bring up a 
> task tracker before job tracker, and probably the same for a datanode before 
> a  namenode. The change that triggered this is not yet located, I had thought 
> it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to