[
https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690172#comment-17690172
]
Duo Zhang commented on HBASE-27650:
-----------------------------------
OK, the problem here is that, a new region fully covered an old region, if
there is no request fall into the old region's range, we have no chance to
clear the old region from cache.
I think a possible fix is when adding a new entry into the meta cache, we
should try to remove all the overlapped regions.
> Merging empty regions corrupts meta cache
> -----------------------------------------
>
> Key: HBASE-27650
> URL: https://issues.apache.org/jira/browse/HBASE-27650
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> Let's say you have three regions with start keys A, B, C and all are cached
> in the meta cache. Region B is empty and not getting any requests, and all 3
> regions are merged together. The new merged region has start key A.
> A user submits a request for row C1, which would previously have gone to
> region C. That region no longer exists, so the MetaCache returns region C,
> the request goes out to the server which throws NotServingRegionException.
> That region C is now removed from the cache, and meta is scanned. The meta
> scan returns the newly merged region A, which is cached into the MetaCache.
> So now we have a MetaCache where A has been updated with the newly merged
> RegionInfo, B still exists with the old/deleted RegionInfo, and C has been
> removed.
> A user submits a request for row C1 again. This _should_ go to region A, but
> we do cache.floorEntry(C1) which returns the old but still cached region B.
> We have checks in MetaCache which validate the RegionInfo.getEndKey() against
> the requested row, and that validation fails because C1 is beyond the endkey
> of the old region. The cached region B result is ignored and cache returns
> null. Meta is scanned, and returns the new region A, which is cached again.
> Requests to rows C1+ will still succeed... but they will always require a
> meta scan because the meta cache will always return that old region B which
> is invalid and doesn't contain the C1+ rows.
> Currently, the only way this will ever resolve is if a request is sent to
> region B, which will cause a NotServingRegionException which will finally
> clear region B from the cache. At that point, requests for C1+ will properly
> get resolved to region A in the cache.
> I've created a reproducible test case here:
> [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
> This problem affects both AsyncTable and branch-2's Table.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)