bbeaudreault commented on code in PR #5037:
URL: https://github.com/apache/hbase/pull/5037#discussion_r1111103672
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java:
##########
@@ -217,6 +217,223 @@ private boolean tryComplete(LocateRequest req,
CompletableFuture<RegionLocations
}
}
+ /**
+ * Wrapper around ConcurrentSkipListMap ensuring proper access to cached
items. Updates are
+ * synchronized, but reads are not.
+ */
+ private static final class RegionLocationCache {
+
+ private final ConcurrentNavigableMap<byte[], RegionLocations> cache =
+ new ConcurrentSkipListMap<>(BYTES_COMPARATOR);
+
+ /**
+ * Add the given locations to the cache, merging with existing if
necessary. Also cleans out any
+ * previously cached locations which may have been superseded by this one
(i.e. in case of
+ * merged regions). See {@link #cleanOverlappingRegions(RegionLocations)}
+ * @param locs the locations to cache
+ * @return the final location (possibly merged) that was added to the cache
+ */
+ public synchronized RegionLocations add(RegionLocations locs) {
+ byte[] startKey = locs.getRegionLocation().getRegion().getStartKey();
+ RegionLocations oldLocs = cache.putIfAbsent(startKey, locs);
+ if (oldLocs == null) {
+ cleanOverlappingRegions(locs);
+ return locs;
+ }
+
+ // check whether the regions are the same, this usually happens when
table is split/merged,
+ // or deleted and recreated again.
+ RegionInfo region = locs.getRegionLocation().getRegion();
+ RegionInfo oldRegion = oldLocs.getRegionLocation().getRegion();
+ if (region.getEncodedName().equals(oldRegion.getEncodedName())) {
+ RegionLocations mergedLocs = oldLocs.mergeLocations(locs);
+ if (isEqual(mergedLocs, oldLocs)) {
+ // the merged one is the same with the old one, give up
+ LOG.trace("Will not add {} to cache because the old value {} "
+ + " is newer than us or has the same server name."
+ + " Maybe it is updated before we replace it", locs, oldLocs);
+ return oldLocs;
+ }
+ locs = mergedLocs;
+ } else {
+ // the region is different, here we trust the one we fetched. This
maybe wrong but finally
+ // the upper layer can detect this and trigger removal of the wrong
locations
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("The newly fetch region {} is different from the old one
{} for row '{}',"
+ + " try replaying the old one...", region, oldRegion,
Bytes.toStringBinary(startKey));
+ }
+ }
+
+ cache.put(startKey, locs);
+ cleanOverlappingRegions(locs);
+ return locs;
+ }
+
+ /**
+ * When caching a location, the region may have been the result of a
merge. Check to see if the
+ * region's boundaries overlap any other cached locations. Those would
have been merge parents
+ * which no longer exist. We need to proactively clear them out to avoid a
case where a merged
+ * region which receives no requests never gets cleared. This causes
requests to other merged
+ * regions after it to see the wrong cached location. See HBASE-27650
+ * @param locations the new location that was just cached
+ */
+ private void cleanOverlappingRegions(RegionLocations locations) {
+ RegionInfo region = locations.getRegionLocation().getRegion();
+
+ boolean isLast = isEmptyStopRow(region.getEndKey());
+
+ while (true) {
Review Comment:
Ok that makes sense, but not sure it's an issue. Just to be clear, I'm open
to making a change here. I'm just trying to think through this, so please bear
with me.
The problem we are trying to solve is due to how we use floorEntry to query
the cache. Using floorEntry opens us to a problem where a stale cache entry
with startKey _greater than_ the correct one can cause the correct one to never
be returned. The current solution solves that.
Since we use floorEntry, I don't think stale entries with startKey _less
than_ the correct location are really a problem. They would exist in cache but
not cause any issues. If a request went to them they would be cleared. If they
got overlapped by another region they'd be cleaned up at that point. Assuming a
relatively active table, it would all clean up over time as different regions
get requested.
----
That said, I did think about how to do it. All entries are indexed by
startKey, but we're concerned about endKey. We could pretty easily check the
entry just prior to the cached location. But that doesn't cover us.
Theoretically even the first entry in the cache could have an endKey that
overlaps.
So the only way to fully be sure of no overlaps given the endless
possibility is to fully check _all_ entries to the head of the cache. I don't
think this is worth it given there could be many thousands of regions for a
table and there sometimes be bursts of regions being cached which would all
have to scan to head.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]