[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140249#comment-17140249
 ] 

Yuxuan Wang commented on HDFS-15419:
------------------------------------

[~bhji123]
Well, I more agree with [~ayushtkn]. And I think we should remove the retry 
code currently in router ranther than add more retry to it.
I see [~elgoiri] review the PR. How do you think Saxena's comment?

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15419
>                 URL: https://issues.apache.org/jira/browse/HDFS-15419
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: configuration, hdfs-client, rbf
>            Reporter: bhji123
>            Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to