[
https://issues.apache.org/jira/browse/HBASE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eungsop Yoo reassigned HBASE-27593:
-----------------------------------
Assignee: Eungsop Yoo
> Clear meta cache for full server when handling FailedServerException
> --------------------------------------------------------------------
>
> Key: HBASE-27593
> URL: https://issues.apache.org/jira/browse/HBASE-27593
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Assignee: Eungsop Yoo
> Priority: Major
>
> Currently we prefer to clear meta cache for an individual region that fails.
> This is preferred in most cases, because clearing cache for an entire server
> is much more expensive. If a server hosts 100 regions, unnecessarily clearing
> the cache for the entire server would cause 100 meta requests per client.
> However, when a client fails to connect to a regionserver, it gets added to
> the FailedServers list. Subsequent requests to that server are fast-failed,
> throwing a FailedServerException.
> This is a pretty clear indicator that there's a problem with a specific
> server. In this case I think we should clear the cache for that full server.
> We had a production incident recently where a server completely hung and we
> did see "Clear Region" calls, but the server hosted many regions and the meta
> clears continued for a while longer than necessary. Adding "Clear Server"
> call due to FailedServers would have mitigated this issue much quicker.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)