[ 
https://issues.apache.org/jira/browse/HDFS-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292963#comment-13292963
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3504:
----------------------------------------------

> Not sure if exponential backoff is flexible enough. Typically one wants to 
> retry every 10 sec till about a minute and then retry every 60 sec.

For exponential backoff, provided that the exponentialBackoff retry policy is 
used and the average sleep time of the first retry is 1 second, we have the 
following.

|| n-th retry || average sleep time (seconds) || on average, the n-th retry 
will happen in (seconds) ||
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 4 | 7 |
| 4 | 8 | 15 |
| 5 | 16 | 31 |
| ... |
| n | 2^(n-1) | 2^n - 1 |

The value of dfs.client.retry.max should depend on the failover time.  Suppose 
the failover time is around 10 minutes.  Then setting dfs.client.retry.max=10 
will take ~17 minutes to finish all 10 reties.  However, the last few retries 
will sleep for a long time.  I think it is undesirable.  Let me think about 
this more.

> You forgot about the connection retry.

Sure, will also change it.

> Why is MiniDfsCluster changes needed?

I just have moved the LOG message "Cluster is active" to waitActive().  I 
believe it is a better place for it.

                
> Configurable retry in DFSClient
> -------------------------------
>
>                 Key: HDFS-3504
>                 URL: https://issues.apache.org/jira/browse/HDFS-3504
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Siddharth Seth
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: h3504_20120607.patch, h3504_20120608.patch
>
>
> When NN maintenance is performed on a large cluster, jobs end up failing. 
> This is particularly bad for long running jobs. The client retry policy could 
> be made configurable so that jobs don't need to be restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to