[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356614#comment-16356614
 ] 

Yiqun Lin edited comment on HDFS-13119 at 2/8/18 8:05 AM:
----------------------------------------------------------

Just looked into this,
{quote}When a federated cluster has one of the subcluster down, operations that 
run in every subcluster (RouterRpcClient#invokeAll()) may take all the RPC 
connections.
{quote}
Looked into the related code, I didn't see the logic for triggering RPC 
requests for every subclustet once one subcluster was down. I just looked the 
method {{RouterRpcClient#invoke}} invoked in {{RouterRpcClient#invokeMethod}}. 
Correct me If I am wrong.

{quote}
Better control of the number of RPC clients
{quote}
Not so clear for this, do you mean we may have a maximum RPC queue size in 
Router RPC server side?

I have a proposal for "No need to try so many times if we "know" the subcluster 
is down": When the failed happened, then query from {{ActiveNamenodeResolver}} 
if the cluster is down, if yes, don't do retry. In addition, current default 
retry times (10 times) can be decreased a lot.


was (Author: linyiqun):
Just looked into this,
{quote}When a federated cluster has one of the subcluster down, operations that 
run in every subcluster (RouterRpcClient#invokeAll()) may take all the RPC 
connections.
{quote}
Looked into the related code, I didn't see the logic for triggering RPC 
requests for every subclustet once one subcluster was down. I just looked the 
method {{RouterRpcClient#invoke}} invoked in {{RouterRpcClient#invokeMethod}}. 
Correct me If I am wrong.

Not so clear for this, would you describe more?
{quote}
Better control of the number of RPC clients
{quote}

I have a proposal for "No need to try so many times if we "know" the subcluster 
is down": When the failed happened, then query from {{ActiveNamenodeResolver}} 
if the cluster is down, if yes, don't do retry. In addition, current default 
retry times (10 times) can be decreased a lot.

> RBF: Manage unavailable clusters
> --------------------------------
>
>                 Key: HDFS-13119
>                 URL: https://issues.apache.org/jira/browse/HDFS-13119
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Yiqun Lin
>            Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to