[GitHub] [hbase] Apache9 commented on a diff in pull request #5037: HBASE-27650 Merging empty regions corrupts meta cache

via GitHub Fri, 17 Feb 2023 18:42:04 -0800


Apache9 commented on code in PR #5037:
URL: https://github.com/apache/hbase/pull/5037#discussion_r1110524517



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java:
##########
@@ -299,40 +299,86 @@ private RegionLocations addToCache(TableCache tableCache, 
RegionLocations locs)
     LOG.trace("Try adding {} to cache", locs);
     byte[] startKey = locs.getRegionLocation().getRegion().getStartKey();
     for (;;) {
-      RegionLocations oldLocs = tableCache.cache.putIfAbsent(startKey, locs);
-      if (oldLocs == null) {
-        return locs;
-      }
-      // check whether the regions are the same, this usually happens when 
table is split/merged, or
-      // deleted and recreated again.
-      RegionInfo region = locs.getRegionLocation().getRegion();
-      RegionInfo oldRegion = oldLocs.getRegionLocation().getRegion();
-      if (region.getEncodedName().equals(oldRegion.getEncodedName())) {
-        RegionLocations mergedLocs = oldLocs.mergeLocations(locs);
-        if (isEqual(mergedLocs, oldLocs)) {
-          // the merged one is the same with the old one, give up
-          LOG.trace("Will not add {} to cache because the old value {} "
-            + " is newer than us or has the same server name."
-            + " Maybe it is updated before we replace it", locs, oldLocs);
-          return oldLocs;
-        }
-        if (tableCache.cache.replace(startKey, oldLocs, mergedLocs)) {
-          return mergedLocs;
-        }
-      } else {
-        // the region is different, here we trust the one we fetched. This 
maybe wrong but finally
-        // the upper layer can detect this and trigger removal of the wrong 
locations
-        if (LOG.isDebugEnabled()) {
-          LOG.debug("The newnly fetch region {} is different from the old one 
{} for row '{}',"
-            + " try replaing the old one...", region, oldRegion, 
Bytes.toStringBinary(startKey));
-        }
-        if (tableCache.cache.replace(startKey, oldLocs, locs)) {
+      // synchronize here because we may need to make multiple modifications in
+      // cleanOverlappingRegions, and we want them to be atomic
+      synchronized (tableCache) {
+        RegionLocations oldLocs = tableCache.cache.putIfAbsent(startKey, locs);
+        if (oldLocs == null) {
+          cleanOverlappingRegions(locs, tableCache);
           return locs;
         }
+        // check whether the regions are the same, this usually happens when 
table is split/merged,
+        // or
+        // deleted and recreated again.
+        RegionInfo region = locs.getRegionLocation().getRegion();
+        RegionInfo oldRegion = oldLocs.getRegionLocation().getRegion();
+        if (region.getEncodedName().equals(oldRegion.getEncodedName())) {
+          RegionLocations mergedLocs = oldLocs.mergeLocations(locs);
+          if (isEqual(mergedLocs, oldLocs)) {
+            // the merged one is the same with the old one, give up
+            LOG.trace("Will not add {} to cache because the old value {} "
+              + " is newer than us or has the same server name."
+              + " Maybe it is updated before we replace it", locs, oldLocs);
+            return oldLocs;
+          }
+          if (tableCache.cache.replace(startKey, oldLocs, mergedLocs)) {

Review Comment:
   If we have synchronized then I do not think we still need to use compare and 
replace.
   
   And I think we'd better abstract a data structure for holding the cache, 
i.e, we introduce a class may be called RegionLocationsCache(or something else 
which can indicate that it is just a data structure, sorry not a native 
speaker...), the internal data structure is still a CSLM, but the upper layer 
can not access the CSLM directly, instead, we introduce several methods, such 
as lookup, add, remove, clear, etc., to let upper layer use it.
   
   In this way, it is free for us to use some synchronized method for now and 
later we could try to see if it is possible to make it all lock free.
   
   I'm a bit worried about the current usage as we just exposes the CSLM out, 
it is easy to be misused by other developers...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] Apache9 commented on a diff in pull request #5037: HBASE-27650 Merging empty regions corrupts meta cache

Reply via email to