[jira] [Created] (HBASE-21744) timeout for server list refresh calls

Sergey Shelukhin (JIRA) Fri, 18 Jan 2019 16:26:09 -0800

Sergey Shelukhin created HBASE-21744:
----------------------------------------


             Summary: timeout for server list refresh calls 
                 Key: HBASE-21744
                 URL: https://issues.apache.org/jira/browse/HBASE-21744
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Not sure why yet, but we are seeing the case when cluster is in overall a bad 
state, where after RS dies and deletes its znode, the notification looks like 
it's lost, so the master doesn't detect the failure. ZK itself appears to be 
healthy and doesn't report anything special.
After some other change is made to the server list, master rescans the list and 
picks up the stale notification. Might make sense to add a config that would 
trigger the refresh if it hasn't happened for a while (e.g. 1 minute).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21744) timeout for server list refresh calls

Reply via email to