Nick Dimiduk created HBASE-24360:
------------------------------------

             Summary: RollingBatchRestartRsAction loses track of dead servers
                 Key: HBASE-24360
                 URL: https://issues.apache.org/jira/browse/HBASE-24360
             Project: HBase
          Issue Type: Test
          Components: integration tests
    Affects Versions: 2.3.0
            Reporter: Nick Dimiduk
            Assignee: Nick Dimiduk


{{RollingBatchRestartRsAction}} doesn't handle failure cases when tracking its 
list of dead servers. The original author believed that a failure to restart 
would result in a retry. However, by removing the dead server from the failed 
list prematurely, that state is lost, and retry of that server never occurs. 
Because this action doesn't ever look back to the current state of the cluster, 
relying only on its local state for the current action invocation, it never 
realizes the abandoned server is still dead. Instead, be more careful to only 
remove the dead server from the list when the {{startRs}} invocation claims to 
have been successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to