ankitsinghal commented on code in PR #5341:
URL: https://github.com/apache/hbase/pull/5341#discussion_r1290343643


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:
##########
@@ -589,6 +588,12 @@ public Cacheable getBlock(BlockCacheKey key, boolean 
caching, boolean repeat,
           }
           return cachedBlock;
         }
+      } catch (HBaseIOException hioex) {
+        // When using file io engine persistent cache,
+        // the cache map state might differ from the actual cache. If we reach 
this block,
+        // we should remove the cache key entry from the backing map
+        backingMap.remove(key);
+        LOG.debug("Failed to fetch block for cache key: {}.", key, hioex);

Review Comment:
   I'm not able to find which part of the code throws HBaseIOException.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/PersistentIOEngine.java:
##########
@@ -70,6 +70,7 @@ protected byte[] calculateChecksum(String algorithm) {
         sb.append(getFileSize(filePath));
         sb.append(file.lastModified());
       }
+      LOG.debug("Checksum for persistence cache: {}", sb);

Review Comment:
   nit: Not sure if useful, can be removed



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:
##########
@@ -1252,16 +1259,19 @@ static List<RAMQueueEntry> 
getRAMQueueEntries(BlockingQueue<RAMQueueEntry> q,
   @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = 
"OBL_UNSATISFIED_OBLIGATION",
       justification = "false positive, try-with-resources ensures close is 
called.")
   void persistToFile() throws IOException {
-    if (!ioEngine.isPersistent()) {
+    if (!isCachePersistent()) {
       throw new IOException("Attempt to persist non-persistent cache 
mappings!");
     }
-    try (FileOutputStream fos = new FileOutputStream(persistencePath, false)) {
+    File tempPersistencePath = new File(persistencePath + 
EnvironmentEdgeManager.currentTime());
+    try (FileOutputStream fos = new FileOutputStream(tempPersistencePath, 
false)) {
       fos.write(ProtobufMagic.PB_MAGIC);
       BucketProtoUtils.toPB(this).writeDelimitedTo(fos);
     }
-    if (prefetchedFileListPath != null) {
-      PrefetchExecutor.persistToFile(prefetchedFileListPath);
-    }
+    tempPersistencePath.renameTo(new File(persistencePath));

Review Comment:
   Do you want to WARN if rename is not successful for some reason? 



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:
##########
@@ -1358,16 +1365,37 @@ private void verifyCapacityAndClasses(long 
capacitySize, String ioclass, String
   }
 
   private void parsePB(BucketCacheProtos.BucketCacheEntry proto) throws 
IOException {
+    backingMap = BucketProtoUtils.fromPB(proto.getDeserializersMap(), 
proto.getBackingMap(),
+      this::createRecycler);
+    prefetchCompleted.clear();
+    prefetchCompleted.putAll(proto.getPrefetchedFilesMap());
     if (proto.hasChecksum()) {
-      ((PersistentIOEngine) 
ioEngine).verifyFileIntegrity(proto.getChecksum().toByteArray(),
-        algorithm);
+      try {
+        ((PersistentIOEngine) 
ioEngine).verifyFileIntegrity(proto.getChecksum().toByteArray(),
+          algorithm);
+      } catch (IOException e) {
+        LOG.warn("Checksum for cache file failed. "
+          + "We need to validate each cache key in the backing map. This may 
take some time...");
+        long startTime = EnvironmentEdgeManager.currentTime();
+        int totalKeysOriginally = backingMap.size();
+        for (Map.Entry<BlockCacheKey, BucketEntry> keyEntry : 
backingMap.entrySet()) {
+          try {
+            ((FileIOEngine) ioEngine).checkCacheTime(keyEntry.getValue());
+          } catch (IOException e1) {
+            LOG.debug("Check for key {} failed. Removing it from map.", 
keyEntry.getKey());
+            backingMap.remove(keyEntry.getKey());
+            prefetchCompleted.remove(keyEntry.getKey().getHfileName());

Review Comment:
   Does removing the whole file when one entry is missing makes sense, can we 
do some % check here that if we are missing the certain % of blocks for the 
file?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:
##########
@@ -1358,16 +1365,37 @@ private void verifyCapacityAndClasses(long 
capacitySize, String ioclass, String
   }
 
   private void parsePB(BucketCacheProtos.BucketCacheEntry proto) throws 
IOException {
+    backingMap = BucketProtoUtils.fromPB(proto.getDeserializersMap(), 
proto.getBackingMap(),
+      this::createRecycler);
+    prefetchCompleted.clear();
+    prefetchCompleted.putAll(proto.getPrefetchedFilesMap());
     if (proto.hasChecksum()) {
-      ((PersistentIOEngine) 
ioEngine).verifyFileIntegrity(proto.getChecksum().toByteArray(),
-        algorithm);
+      try {
+        ((PersistentIOEngine) 
ioEngine).verifyFileIntegrity(proto.getChecksum().toByteArray(),
+          algorithm);
+      } catch (IOException e) {
+        LOG.warn("Checksum for cache file failed. "
+          + "We need to validate each cache key in the backing map. This may 
take some time...");
+        long startTime = EnvironmentEdgeManager.currentTime();
+        int totalKeysOriginally = backingMap.size();
+        for (Map.Entry<BlockCacheKey, BucketEntry> keyEntry : 
backingMap.entrySet()) {
+          try {
+            ((FileIOEngine) ioEngine).checkCacheTime(keyEntry.getValue());

Review Comment:
   This looks we are gonna check for each entry and if we have 30GB of backing 
Map for 2TB bucket cache. Then it could take a longer time, did we test it out 
with 2TB of bucket cache to determine how much time it will take to get the RS 
live?
   
   If not performant, We should make this optional and verify the blocks when 
they are read or during the eviction scan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to