[
https://issues.apache.org/jira/browse/HDFS-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292963#comment-13292963
]
Tsz Wo (Nicholas), SZE commented on HDFS-3504:
----------------------------------------------
> Not sure if exponential backoff is flexible enough. Typically one wants to
> retry every 10 sec till about a minute and then retry every 60 sec.
For exponential backoff, provided that the exponentialBackoff retry policy is
used and the average sleep time of the first retry is 1 second, we have the
following.
|| n-th retry || average sleep time (seconds) || on average, the n-th retry
will happen in (seconds) ||
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 4 | 7 |
| 4 | 8 | 15 |
| 5 | 16 | 31 |
| ... |
| n | 2^(n-1) | 2^n - 1 |
The value of dfs.client.retry.max should depend on the failover time. Suppose
the failover time is around 10 minutes. Then setting dfs.client.retry.max=10
will take ~17 minutes to finish all 10 reties. However, the last few retries
will sleep for a long time. I think it is undesirable. Let me think about
this more.
> You forgot about the connection retry.
Sure, will also change it.
> Why is MiniDfsCluster changes needed?
I just have moved the LOG message "Cluster is active" to waitActive(). I
believe it is a better place for it.
> Configurable retry in DFSClient
> -------------------------------
>
> Key: HDFS-3504
> URL: https://issues.apache.org/jira/browse/HDFS-3504
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 1.0.0, 2.0.0-alpha
> Reporter: Siddharth Seth
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h3504_20120607.patch, h3504_20120608.patch
>
>
> When NN maintenance is performed on a large cluster, jobs end up failing.
> This is particularly bad for long running jobs. The client retry policy could
> be made configurable so that jobs don't need to be restarted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira