[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369186#comment-16369186
 ] 

Yiqun Lin commented on HDFS-13119:
----------------------------------

[~elgoiri], thanks for the review.
{quote}It may make sense to do the check for isClusterUnAvailable() inside of 
shouldRetry() instead of as a parameter.
{quote}
Addressed.
{quote}I didn't realize the other day, but I'm not sure we are controlling 
correctly when we hit the maximum number of threads. Right now, I think the 
executor will throw RejectedExecutionException, we may want to wrap this. Not 
sure how; we should let the client retry with another Router and one way is to 
return one of the StandbyExceptions like the RouterSafeModeException or a new 
one.
{quote}
{{RejectedExecutionException}} won't be thrown when we hit the maximum number 
of threads I think. Inside {{Executors#newFixedThreadPool}}, it uses the 
{{Integer.MAX_VALUE}} capacity of LinkedBlockingQueue for storing pending task. 
So it will wait in the queue until a thread is available. We can also get this 
from its javadoc:
{noformat}
 At any point, at most nThreads threads will be active processing tasks. If 
additional tasks are submitted when all threads are active, they will wait in 
the queue until a thread is available. If any thread terminates due to a 
failure during execution prior to shutdown, a new one will take its place if 
needed to execute subsequent tasks.
{noformat}

Attach the updated patch.

> RBF: Manage unavailable clusters
> --------------------------------
>
>                 Key: HDFS-13119
>                 URL: https://issues.apache.org/jira/browse/HDFS-13119
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Yiqun Lin
>            Priority: Major
>         Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to