[GitHub] [hbase] bbeaudreault commented on a diff in pull request #5037: HBASE-27650 Merging empty regions corrupts meta cache

via GitHub Sat, 18 Feb 2023 11:52:26 -0800


bbeaudreault commented on code in PR #5037:
URL: https://github.com/apache/hbase/pull/5037#discussion_r1111103672



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java:
##########
@@ -217,6 +217,223 @@ private boolean tryComplete(LocateRequest req, 
CompletableFuture<RegionLocations
     }
   }
 
+  /**
+   * Wrapper around ConcurrentSkipListMap ensuring proper access to cached 
items. Updates are
+   * synchronized, but reads are not.
+   */
+  private static final class RegionLocationCache {
+
+    private final ConcurrentNavigableMap<byte[], RegionLocations> cache =
+      new ConcurrentSkipListMap<>(BYTES_COMPARATOR);
+
+    /**
+     * Add the given locations to the cache, merging with existing if 
necessary. Also cleans out any
+     * previously cached locations which may have been superseded by this one 
(i.e. in case of
+     * merged regions). See {@link #cleanOverlappingRegions(RegionLocations)}
+     * @param locs the locations to cache
+     * @return the final location (possibly merged) that was added to the cache
+     */
+    public synchronized RegionLocations add(RegionLocations locs) {
+      byte[] startKey = locs.getRegionLocation().getRegion().getStartKey();
+      RegionLocations oldLocs = cache.putIfAbsent(startKey, locs);
+      if (oldLocs == null) {
+        cleanOverlappingRegions(locs);
+        return locs;
+      }
+
+      // check whether the regions are the same, this usually happens when 
table is split/merged,
+      // or deleted and recreated again.
+      RegionInfo region = locs.getRegionLocation().getRegion();
+      RegionInfo oldRegion = oldLocs.getRegionLocation().getRegion();
+      if (region.getEncodedName().equals(oldRegion.getEncodedName())) {
+        RegionLocations mergedLocs = oldLocs.mergeLocations(locs);
+        if (isEqual(mergedLocs, oldLocs)) {
+          // the merged one is the same with the old one, give up
+          LOG.trace("Will not add {} to cache because the old value {} "
+            + " is newer than us or has the same server name."
+            + " Maybe it is updated before we replace it", locs, oldLocs);
+          return oldLocs;
+        }
+        locs = mergedLocs;
+      } else {
+        // the region is different, here we trust the one we fetched. This 
maybe wrong but finally
+        // the upper layer can detect this and trigger removal of the wrong 
locations
+        if (LOG.isDebugEnabled()) {
+          LOG.debug("The newly fetch region {} is different from the old one 
{} for row '{}',"
+            + " try replaying the old one...", region, oldRegion, 
Bytes.toStringBinary(startKey));
+        }
+      }
+
+      cache.put(startKey, locs);
+      cleanOverlappingRegions(locs);
+      return locs;
+    }
+
+    /**
+     * When caching a location, the region may have been the result of a 
merge. Check to see if the
+     * region's boundaries overlap any other cached locations. Those would 
have been merge parents
+     * which no longer exist. We need to proactively clear them out to avoid a 
case where a merged
+     * region which receives no requests never gets cleared. This causes 
requests to other merged
+     * regions after it to see the wrong cached location. See HBASE-27650
+     * @param locations the new location that was just cached
+     */
+    private void cleanOverlappingRegions(RegionLocations locations) {
+      RegionInfo region = locations.getRegionLocation().getRegion();
+
+      boolean isLast = isEmptyStopRow(region.getEndKey());
+
+      while (true) {

Review Comment:
   Ok that makes sense, but not sure it's an issue. Just to be clear, I'm open 
to making a change here. I'm just trying to think through this, so please bear 
with me.
   
   The problem we are trying to solve is due to how we use floorEntry to query 
the cache. Using floorEntry opens us to a problem where a stale cache entry 
with startKey _greater than_ the correct one can cause the correct one to never 
be returned. The current solution solves that. 
   
   Since we use floorEntry, I don't think stale entries with startKey _less 
than_ the correct location are really a problem. They would exist in cache but 
not cause any issues. If a request went to them they would be cleared. If they 
got overlapped by another region they'd be cleaned up at that point. Assuming a 
relatively active table, it would all clean up over time as different regions 
get requested. 
   
   ----
   
   That said, I did think about how to do it. All entries are indexed by 
startKey, but we're concerned about endKey. We could pretty easily check the 
entry just prior to the cached location. But that doesn't cover us. 
Theoretically even the first entry in the cache could have an endKey that 
overlaps.
   
   So the only way to fully be sure of no overlaps given the endless 
possibility is to fully check _all_ entries to the head of the cache. I don't 
think this is worth it given there could be many thousands of regions for a 
table and there sometimes be bursts of regions being cached which would all 
have to scan to head. 
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] bbeaudreault commented on a diff in pull request #5037: HBASE-27650 Merging empty regions corrupts meta cache

Reply via email to