[ 
https://issues.apache.org/jira/browse/HADOOP-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651331#action_12651331
 ] 

Steve Loughran commented on HADOOP-4724:
----------------------------------------

>Something like datanode.connect.timeout, tasktracker.connect.timeout, 
>dfsclient.connect.timeout...

Maybe include the fact that this is for IPC timeouts, not say http

datanode.ipc.connect.timeout
tasktracker.ipc.connect.timeout
dfsclient.ipc.connect.timeout

>I am thinking to start with a large number like 1 hour or 1 day. It is at 
>least backwards compatible.

24 hours would be good. It lets you handle the kind of outage that has the team 
paged in from home and removes the "fix this in 15 minutes before the nodes 
start giving up" crisis

> TaskTracker, DataNode, and SecondaryNameNode should timeout on waiting for 
> its server to be up
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4724
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>             Fix For: 0.20.0
>
>
> TaskTracker, DataNode, and SecondaryNameNode currently wait forever if its 
> server is not up. They should be designed to take a configuration parameter 
> that tells them when to give up, and a default value of many minutes/hours or 
> more to deal with basic choreography issues in a cluster. Test clusters can 
> be set up to fail sooner rather than later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to