Re: [PR] HBASE-28527: Adjust BlockCacheKey to use the file path instead of file name. [hbase]

via GitHub Thu, 18 Apr 2024 03:16:42 -0700


wchevreuil commented on code in PR #5832:
URL: https://github.com/apache/hbase/pull/5832#discussion_r1570397180



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:
##########
@@ -682,7 +683,7 @@ public void fileCacheCompleted(Path filePath, long size) {
   }
 
   private void updateRegionCachedSize(Path filePath, long cachedSize) {
-    if (filePath != null) {
+    if (filePath != null && filePath.getParent() != null && 
filePath.getParent().getParent() != null) {

Review Comment:
   Is this really possible? 



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java:
##########
@@ -39,24 +39,16 @@ public class BlockCacheKey implements HeapSize, 
java.io.Serializable {
    * @param hfileName The name of the HFile this block belongs to.
    * @param offset    Offset of the block into the file
    */
-  public BlockCacheKey(String hfileName, long offset) {
-    this(hfileName, offset, true, BlockType.DATA);
-  }
-
-  public BlockCacheKey(String hfileName, long offset, boolean isPrimaryReplica,
-    BlockType blockType) {
-    this.isPrimaryReplicaBlock = isPrimaryReplica;
-    this.hfileName = hfileName;
-    this.offset = offset;
-    this.blockType = blockType;
+  public BlockCacheKey(Path hfilePath, long offset) {
+    this(hfilePath, offset, true, BlockType.DATA);
   }
 
   public BlockCacheKey(Path hfilePath, long offset, boolean isPrimaryReplica, 
BlockType blockType) {
-    this.filePath = hfilePath;
     this.isPrimaryReplicaBlock = isPrimaryReplica;

Review Comment:
   nit: unneeded change.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketProtoUtils.java:
##########
@@ -130,10 +136,30 @@ static Pair<ConcurrentHashMap<BlockCacheKey, 
BucketEntry>, NavigableSet<BlockCac
     ConcurrentHashMap<BlockCacheKey, BucketEntry> result = new 
ConcurrentHashMap<>();
     NavigableSet<BlockCacheKey> resultSet = new 
ConcurrentSkipListSet<>(Comparator
       
.comparing(BlockCacheKey::getHfileName).thenComparingLong(BlockCacheKey::getOffset));
+
+    Map<String, Path> allFilePaths = null;
+    DataTieringManager dataTieringManager;
+    try {
+      dataTieringManager = DataTieringManager.getInstance();
+      allFilePaths = dataTieringManager.getAllFilesList();
+    } catch (IllegalStateException e) {
+      // Data-Tiering manager has not been set up.
+      // Ignore the error and proceed with the normal flow.
+      LOG.warn("Error while getting DataTieringManager instance: {}", 
e.getMessage());
+    }
+
     for (BucketCacheProtos.BackingMapEntry entry : backingMap.getEntryList()) {
       BucketCacheProtos.BlockCacheKey protoKey = entry.getKey();
-      BlockCacheKey key = new BlockCacheKey(protoKey.getHfilename(), 
protoKey.getOffset(),
-        protoKey.getPrimaryReplicaBlock(), fromPb(protoKey.getBlockType()));
+
+      BlockCacheKey key = null;
+      if (allFilePaths != null) {
+        key = new BlockCacheKey(allFilePaths.get(protoKey.getHfilename()), 
protoKey.getOffset(),

Review Comment:
   ` allFilePaths.get(protoKey.getHfilename())` may yield a null value, if we 
are recovering from a crash where we didn't have a chance to save the 
persistent index after a given file may have been removed. 



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketProtoUtils.java:
##########
@@ -130,10 +136,30 @@ static Pair<ConcurrentHashMap<BlockCacheKey, 
BucketEntry>, NavigableSet<BlockCac
     ConcurrentHashMap<BlockCacheKey, BucketEntry> result = new 
ConcurrentHashMap<>();
     NavigableSet<BlockCacheKey> resultSet = new 
ConcurrentSkipListSet<>(Comparator
       
.comparing(BlockCacheKey::getHfileName).thenComparingLong(BlockCacheKey::getOffset));
+
+    Map<String, Path> allFilePaths = null;
+    DataTieringManager dataTieringManager;
+    try {
+      dataTieringManager = DataTieringManager.getInstance();
+      allFilePaths = dataTieringManager.getAllFilesList();
+    } catch (IllegalStateException e) {
+      // Data-Tiering manager has not been set up.
+      // Ignore the error and proceed with the normal flow.
+      LOG.warn("Error while getting DataTieringManager instance: {}", 
e.getMessage());

Review Comment:
   Is this going to be thrown whenever data tiering type isn't set to 
TIME_RANGE?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java:
##########
@@ -39,24 +39,16 @@ public class BlockCacheKey implements HeapSize, 
java.io.Serializable {
    * @param hfileName The name of the HFile this block belongs to.
    * @param offset    Offset of the block into the file
    */
-  public BlockCacheKey(String hfileName, long offset) {
-    this(hfileName, offset, true, BlockType.DATA);
-  }
-
-  public BlockCacheKey(String hfileName, long offset, boolean isPrimaryReplica,
-    BlockType blockType) {
-    this.isPrimaryReplicaBlock = isPrimaryReplica;
-    this.hfileName = hfileName;
-    this.offset = offset;
-    this.blockType = blockType;
+  public BlockCacheKey(Path hfilePath, long offset) {
+    this(hfilePath, offset, true, BlockType.DATA);
   }

Review Comment:
   Can we just leave these and avoid having to touch every single existing 
test? As having a complete path is only relevant for the TIME_RANGE data 
tiering, I think it's fine to allow non complete paths when TIME_RANGE data 
tiering is not in use.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketProtoUtils.java:
##########
@@ -130,10 +136,30 @@ static Pair<ConcurrentHashMap<BlockCacheKey, 
BucketEntry>, NavigableSet<BlockCac
     ConcurrentHashMap<BlockCacheKey, BucketEntry> result = new 
ConcurrentHashMap<>();
     NavigableSet<BlockCacheKey> resultSet = new 
ConcurrentSkipListSet<>(Comparator
       
.comparing(BlockCacheKey::getHfileName).thenComparingLong(BlockCacheKey::getOffset));
+
+    Map<String, Path> allFilePaths = null;
+    DataTieringManager dataTieringManager;
+    try {
+      dataTieringManager = DataTieringManager.getInstance();
+      allFilePaths = dataTieringManager.getAllFilesList();

Review Comment:
   We don't need this for non TIME_RANGE tiering type, right? Can we avoid it, 
then?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketProtoUtils.java:
##########
@@ -130,10 +136,30 @@ static Pair<ConcurrentHashMap<BlockCacheKey, 
BucketEntry>, NavigableSet<BlockCac
     ConcurrentHashMap<BlockCacheKey, BucketEntry> result = new 
ConcurrentHashMap<>();
     NavigableSet<BlockCacheKey> resultSet = new 
ConcurrentSkipListSet<>(Comparator
       
.comparing(BlockCacheKey::getHfileName).thenComparingLong(BlockCacheKey::getOffset));
+
+    Map<String, Path> allFilePaths = null;
+    DataTieringManager dataTieringManager;
+    try {
+      dataTieringManager = DataTieringManager.getInstance();
+      allFilePaths = dataTieringManager.getAllFilesList();
+    } catch (IllegalStateException e) {
+      // Data-Tiering manager has not been set up.
+      // Ignore the error and proceed with the normal flow.
+      LOG.warn("Error while getting DataTieringManager instance: {}", 
e.getMessage());
+    }
+
     for (BucketCacheProtos.BackingMapEntry entry : backingMap.getEntryList()) {
       BucketCacheProtos.BlockCacheKey protoKey = entry.getKey();
-      BlockCacheKey key = new BlockCacheKey(protoKey.getHfilename(), 
protoKey.getOffset(),
-        protoKey.getPrimaryReplicaBlock(), fromPb(protoKey.getBlockType()));
+
+      BlockCacheKey key = null;
+      if (allFilePaths != null) {
+        key = new BlockCacheKey(allFilePaths.get(protoKey.getHfilename()), 
protoKey.getOffset(),
+          protoKey.getPrimaryReplicaBlock(), fromPb(protoKey.getBlockType()));
+      } else {
+        key = new BlockCacheKey(new Path(protoKey.getHfilename()), 
protoKey.getOffset(),

Review Comment:
   In which cases will `allFilesPath` be non null? If it's whenever data 
tiering isn't TIME_RANGE, I think it's ok to don't bother about file path. In 
those cases though, I think we should use the fileName version of the cache key 
constructor.
   
   And when it isn't null, what do we do with keys that weren't found in 
`allFilesPath`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28527: Adjust BlockCacheKey to use the file path instead of file name. [hbase]

Reply via email to