[
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139085#comment-17139085
]
bhji123 commented on HDFS-15419:
--------------------------------
[https://github.com/apache/hadoop/pull/2082]
Here is the pr to fix this problem.
> router retry with configurable time interval when cluster is unavailable
> ------------------------------------------------------------------------
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: configuration, hdfs-client, rbf
> Reporter: bhji123
> Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than
> 1000 nodes, we have encountered this problem. In some cases, the cluster
> becomes unavailable briefly for about 10 or 30 seconds, at the same time,
> almost all rpc requests to router failed because router only retry once
> without time interval.
> It's better for us to enhance the router retry strategy, to retry with
> configurable time interval and max retry times.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]