[jira] [Created] (HBASE-27593) Clear meta cache for full server when handling FailedServerException

Bryan Beaudreault (Jira) Wed, 25 Jan 2023 08:09:22 -0800

Bryan Beaudreault created HBASE-27593:
-----------------------------------------


             Summary: Clear meta cache for full server when handling 
FailedServerException
                 Key: HBASE-27593
                 URL: https://issues.apache.org/jira/browse/HBASE-27593
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


Currently we prefer to clear meta cache for an individual region that fails. 
This is preferred in most cases, because clearing cache for an entire server is 
much more expensive. If a server hosts 100 regions, unnecessarily clearing the 
cache for the entire server would cause 100 meta requests per client.

However, when a client fails to connect to a regionserver, it gets added to the 
FailedServers list. Subsequent requests to that server are fast-failed, 
throwing a FailedServerException.

This is a pretty clear indicator that there's a problem with a specific server. 
In this case I think we should clear the cache for that full server.

We had a production incident recently where a server completely hung and we did 
see "Clear Region" calls, but the server hosted many regions and the meta 
clears continued for a while longer than necessary. Adding "Clear Server" call 
due to FailedServers would have mitigated this issue much quicker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HBASE-27593) Clear meta cache for full server when handling FailedServerException

Reply via email to