[jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Steve Loughran (JIRA) Fri, 14 Nov 2008 07:21:35 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647627#action_12647627
 ]


Steve Loughran commented on HADOOP-4659:
----------------------------------------

The problem could be - I repeat could be- from HADOOP-2188, though I'm not 
sure. There have been too many changes to roll back, and its easier to go 
forwards. 

I have a patch that (correctly) puts the task tracker back to retrying
[sf-startdaemon-debug] 08/11/14 15:06:43 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:43 [Thread-41] INFO datanode.DataNode : 
BlockReport of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:44 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:45 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:46 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.RPC : Server at 
localhost/127.0.0.1:8012 not available yet, Zzzzz...
[sf-startdaemon-debug] 08/11/14 15:06:49 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 0 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:50 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 1 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:51 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 2 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:52 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 3 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 4 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [Thread-41] INFO datanode.DataNode : 
BlockReport of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:54 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:55 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:56 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:57 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.Client : 
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.RPC : Server at 
localhost/127.0.0.1:8012 not available yet, Zzzzz...


> Root cause of connection failure is being lost to code that uses it for 
> delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> ipc.Client the root cause of a connection failure is being lost as the 
> exception is wrapped, hence the outside code, the one that looks for that 
> root cause, isn't working as expected. The results is you can't bring up a 
> task tracker before job tracker, and probably the same for a datanode before 
> a  namenode. The change that triggered this is not yet located, I had thought 
> it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup

Reply via email to