Sergey Shelukhin created HBASE-22287:
----------------------------------------
Summary: inifinite retries on failed server in
RSProcedureDispatcher
Key: HBASE-22287
URL: https://issues.apache.org/jira/browse/HBASE-22287
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
We observed this recently on some cluster, I'm still investigating the root
cause however seems like the retries should have special handling for this
exception; and separately probably a cap on number of retries
{noformat}
2019-04-20 04:24:27,093 WARN [RSProcedureDispatcher-pool4-t1285]
procedure.RSProcedureDispatcher: request to server ,17020,1555742560432 failed
due to java.io.IOException: Call to :17020 failed on local exception:
org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed
servers list: :17020, try=26603, retrying...
{noformat}
The corresponding worker is stuck
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)