[jira] [Commented] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.

Chris Nauroth (JIRA) Mon, 01 Jun 2015 14:20:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568031#comment-14568031
 ]


Chris Nauroth commented on HDFS-8510:
-------------------------------------

The current situation is problematic for rolling upgrades in deployments that 
have set {{ipc.client.connect.max.retries}} and/or 
{{ipc.client.connect.retry.interval}} to something higher than the default.  
This command is run in situations where it is expected that the DataNode is 
down, therefore the expectation is that the connection will fail.  The command 
can spend a lot of time in a connection retry loop.  In the worst case, a 
script that stops and then restarts a DataNode will have to wait so long for 
the retry loop to complete that it can't restart the DataNode in time to meet 
the 30-second deadline required for OOB ack response handling in the client.  
Missing this deadline forces clients into pipeline recoveries, which is 
sub-optimal.

To minimize surprises for existing deployments, let's set these new timeout 
configuration properties to use the same default values as 
{{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}.

> Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8510
>                 URL: https://issues.apache.org/jira/browse/HDFS-8510
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: tools
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> During a rolling upgrade, an administrator runs {{hdfs dfsadmin 
> -getDatanodeInfo}} to check if a DataNode has stopped.  Currently, this 
> operation is subject to the RPC connection retries defined in 
> {{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}. 
>  This issue proposes adding separate configuration properties to control the 
> retries for this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.

Reply via email to