Bryan Beaudreault created HBASE-27650:
-----------------------------------------

             Summary: Merging empty regions corrupts meta cache
                 Key: HBASE-27650
                 URL: https://issues.apache.org/jira/browse/HBASE-27650
             Project: HBase
          Issue Type: Bug
            Reporter: Bryan Beaudreault


Let's say you have three regions with start keys A, B, C and all are cached in 
the meta cache. Region B is empty and not getting any requests, and all 3 
regions are merged together. The new merged region has start key A.

A user submits a request for row C1, which would previously have gone to region 
C. That region no longer exists, so the MetaCache returns region C, the request 
goes out to the server which throws NotServingRegionException. That region C is 
now removed from the cache, and meta is scanned. The meta scan returns the 
newly merged region A, which is cached into the MetaCache.

So now we have a MetaCache where A has been updated with the newly merged 
RegionInfo, B still exists with the old/deleted RegionInfo, and C has been 
removed.

A user submits a request for row C1 again. This _should_ go to region A, but we 
do cache.floorEntry(C1) which returns the old but still cached region B. We 
have checks in MetaCache which validate the RegionInfo.getEndKey() against the 
requested row, and that validation fails because C1 is beyond the endkey of the 
old region. The cached region B result is ignored and cache returns null. Meta 
is scanned, and returns the new region A, which is cached again.

Requests to rows C1+ will still succeed... but they will always require a meta 
scan because the meta cache will always return that old region B which is 
invalid and doesn't contain the C1+ rows.

Currently, the only way this will ever resolve is if a request is sent to 
region B, which will cause a NotServingRegionException which will finally clear 
region B from the cache. At that point, requests for C1+ will properly get 
resolved to region A in the cache.

I've created a reproducible test case here: 
[https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]

This problem affects both AsyncTable and branch-2's Table.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to