Bryan Beaudreault created HBASE-27650:
-----------------------------------------
Summary: Merging empty regions corrupts meta cache
Key: HBASE-27650
URL: https://issues.apache.org/jira/browse/HBASE-27650
Project: HBase
Issue Type: Bug
Reporter: Bryan Beaudreault
Let's say you have three regions with start keys A, B, C and all are cached in
the meta cache. Region B is empty and not getting any requests, and all 3
regions are merged together. The new merged region has start key A.
A user submits a request for row C1, which would previously have gone to region
C. That region no longer exists, so the MetaCache returns region C, the request
goes out to the server which throws NotServingRegionException. That region C is
now removed from the cache, and meta is scanned. The meta scan returns the
newly merged region A, which is cached into the MetaCache.
So now we have a MetaCache where A has been updated with the newly merged
RegionInfo, B still exists with the old/deleted RegionInfo, and C has been
removed.
A user submits a request for row C1 again. This _should_ go to region A, but we
do cache.floorEntry(C1) which returns the old but still cached region B. We
have checks in MetaCache which validate the RegionInfo.getEndKey() against the
requested row, and that validation fails because C1 is beyond the endkey of the
old region. The cached region B result is ignored and cache returns null. Meta
is scanned, and returns the new region A, which is cached again.
Requests to rows C1+ will still succeed... but they will always require a meta
scan because the meta cache will always return that old region B which is
invalid and doesn't contain the C1+ rows.
Currently, the only way this will ever resolve is if a request is sent to
region B, which will cause a NotServingRegionException which will finally clear
region B from the cache. At that point, requests for C1+ will properly get
resolved to region A in the cache.
I've created a reproducible test case here:
[https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
This problem affects both AsyncTable and branch-2's Table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)