[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369186#comment-16369186 ]
Yiqun Lin commented on HDFS-13119: ---------------------------------- [~elgoiri], thanks for the review. {quote}It may make sense to do the check for isClusterUnAvailable() inside of shouldRetry() instead of as a parameter. {quote} Addressed. {quote}I didn't realize the other day, but I'm not sure we are controlling correctly when we hit the maximum number of threads. Right now, I think the executor will throw RejectedExecutionException, we may want to wrap this. Not sure how; we should let the client retry with another Router and one way is to return one of the StandbyExceptions like the RouterSafeModeException or a new one. {quote} {{RejectedExecutionException}} won't be thrown when we hit the maximum number of threads I think. Inside {{Executors#newFixedThreadPool}}, it uses the {{Integer.MAX_VALUE}} capacity of LinkedBlockingQueue for storing pending task. So it will wait in the queue until a thread is available. We can also get this from its javadoc: {noformat} At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. {noformat} Attach the updated patch. > RBF: Manage unavailable clusters > -------------------------------- > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Íñigo Goiri > Assignee: Yiqun Lin > Priority: Major > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org