[
https://issues.apache.org/jira/browse/HBASE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241377#comment-17241377
]
Duo Zhang commented on HBASE-25343:
-----------------------------------
I think the problem here is that, we should try to avoid acessing the failed
meta replica again if it is not available for now.
Of course this is just a nice to have, with random selection usually we could
find a health replica soon.
If you still have interest, you could see how we clear the stale cache in
retrying caller, the logic is in AsyncRpcRetryingCaller.onError, where we will
pass in a updateCachedLocation action, which takes a Throwable. In the
implementation of the action, usually we will clear the location cache, or
stale master stub to avoid using the wrong cache next time. Maybe we could do
the same for CatalogReplicaLoadBalanceSelector, for example, in
AsyncNonMetaRegionLocator.locateInMeta method, after issuing a scan, in the
onError method of the AdvancedScanResultConsumer, we could check if we are
using the load balancer mode, if so, we get the replica id from the scan, and
tell the CatalogReplicaLoadBalanceSelector to temporarily disable this replica
for a while.
Hope this could help.
Thanks.
> Add HA support on top of Load Balance mode
> ------------------------------------------
>
> Key: HBASE-25343
> URL: https://issues.apache.org/jira/browse/HBASE-25343
> Project: HBase
> Issue Type: Sub-task
> Components: meta replicas
> Affects Versions: 2.4.0
> Reporter: Huaxiang Sun
> Assignee: Huaxiang Sun
> Priority: Major
> Fix For: 2.4.1
>
>
> This is a follow-up enhancement with Stack, Duo. With the newly introduced
> meta replica LoadBalance mode, if there is something wrong with one of meta
> replica regions, the current logic is that it keeps trying until the meta
> replica region is onlined again or it reports error, i.e, there is no HA at
> LoadBalance mode. HA can be implemented if it reports timeout with one meta
> replica region and tries another meta replica region.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)