[ https://issues.apache.org/jira/browse/HBASE-22287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823593#comment-16823593 ]
Duo Zhang commented on HBASE-22287: ----------------------------------- Has the region server already been marked as dead at master side? If not, then I think this is intentional. We can only give up when we make sure that the RS is dead. > inifinite retries on failed server in RSProcedureDispatcher > ----------------------------------------------------------- > > Key: HBASE-22287 > URL: https://issues.apache.org/jira/browse/HBASE-22287 > Project: HBase > Issue Type: Bug > Reporter: Sergey Shelukhin > Priority: Major > > We observed this recently on some cluster, I'm still investigating the root > cause however seems like the retries should have special handling for this > exception; and separately probably a cap on number of retries > {noformat} > 2019-04-20 04:24:27,093 WARN [RSProcedureDispatcher-pool4-t1285] > procedure.RSProcedureDispatcher: request to server ,17020,1555742560432 > failed due to java.io.IOException: Call to :17020 failed on local exception: > org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the > failed servers list: :17020, try=26603, retrying... > {noformat} > The corresponding worker is stuck -- This message was sent by Atlassian JIRA (v7.6.3#76005)