Nick Dimiduk created HBASE-24360:
------------------------------------
Summary: RollingBatchRestartRsAction loses track of dead servers
Key: HBASE-24360
URL: https://issues.apache.org/jira/browse/HBASE-24360
Project: HBase
Issue Type: Test
Components: integration tests
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
{{RollingBatchRestartRsAction}} doesn't handle failure cases when tracking its
list of dead servers. The original author believed that a failure to restart
would result in a retry. However, by removing the dead server from the failed
list prematurely, that state is lost, and retry of that server never occurs.
Because this action doesn't ever look back to the current state of the cluster,
relying only on its local state for the current action invocation, it never
realizes the abandoned server is still dead. Instead, be more careful to only
remove the dead server from the list when the {{startRs}} invocation claims to
have been successful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)