[ https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647627#action_12647627 ]
Steve Loughran commented on HADOOP-4659: ---------------------------------------- The problem could be - I repeat could be- from HADOOP-2188, though I'm not sure. There have been too many changes to roll back, and its easier to go forwards. I have a patch that (correctly) puts the task tracker back to retrying [sf-startdaemon-debug] 08/11/14 15:06:43 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s). [sf-startdaemon-debug] 08/11/14 15:06:43 [Thread-41] INFO datanode.DataNode : BlockReport of 0 blocks got processed in 1 msecs [sf-startdaemon-debug] 08/11/14 15:06:44 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s). [sf-startdaemon-debug] 08/11/14 15:06:45 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s). [sf-startdaemon-debug] 08/11/14 15:06:46 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s). [sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s). [sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.RPC : Server at localhost/127.0.0.1:8012 not available yet, Zzzzz... [sf-startdaemon-debug] 08/11/14 15:06:49 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 0 time(s). [sf-startdaemon-debug] 08/11/14 15:06:50 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 1 time(s). [sf-startdaemon-debug] 08/11/14 15:06:51 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 2 time(s). [sf-startdaemon-debug] 08/11/14 15:06:52 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 3 time(s). [sf-startdaemon-debug] 08/11/14 15:06:53 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 4 time(s). [sf-startdaemon-debug] 08/11/14 15:06:53 [Thread-41] INFO datanode.DataNode : BlockReport of 0 blocks got processed in 1 msecs [sf-startdaemon-debug] 08/11/14 15:06:54 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s). [sf-startdaemon-debug] 08/11/14 15:06:55 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s). [sf-startdaemon-debug] 08/11/14 15:06:56 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s). [sf-startdaemon-debug] 08/11/14 15:06:57 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s). [sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.Client : Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s). [sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.RPC : Server at localhost/127.0.0.1:8012 not available yet, Zzzzz... > Root cause of connection failure is being lost to code that uses it for > delaying startup > ---------------------------------------------------------------------------------------- > > Key: HADOOP-4659 > URL: https://issues.apache.org/jira/browse/HADOOP-4659 > Project: Hadoop Core > Issue Type: Bug > Components: ipc > Affects Versions: 0.19.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > > ipc.Client the root cause of a connection failure is being lost as the > exception is wrapped, hence the outside code, the one that looks for that > root cause, isn't working as expected. The results is you can't bring up a > task tracker before job tracker, and probably the same for a datanode before > a namenode. The change that triggered this is not yet located, I had thought > it was HADOOP-3844 but I no longer believe this is the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.