[
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
bhji123 updated HDFS-15419:
---------------------------
Comment: was deleted
(was: Yes, but clients may not configured appropriately. But if router can
retry too, it will be more reliable.)
> RBF: Router should retry communicate with NN when cluster is unavailable
> using configurable time interval
> ---------------------------------------------------------------------------------------------------------
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: configuration, hdfs-client, rbf
> Reporter: bhji123
> Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than
> 1000 nodes, we have encountered this problem. In some cases, the cluster
> becomes unavailable briefly for about 10 or 30 seconds, at the same time,
> almost all rpc requests to router failed because router only retry once
> without time interval.
> It's better for us to enhance the router retry strategy, to retry
> **communicate with NN using configurable time interval and max retry times.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]