Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

via GitHub Mon, 20 Jan 2025 18:40:44 -0800


EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1922992122



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws 
IOException {
       if (includeDeleteMarker) {
         this.deletes.add(cell);
       }
-      return MatchCode.SKIP;
+      // In some cases, optimization can not be done
+      if (!canOptimizeReadDeleteMarkers()) {
+        return MatchCode.SKIP;
+      }
     }
-    returnCode = checkDeleted(deletes, cell);
-    if (returnCode != null) {
+    // optimization when prevCell is Delete or DeleteFamilyVersion
+    if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+      return returnCode;
+    }
+    if ((returnCode = checkDeleted(deletes, cell)) != null) {
       return returnCode;
     }
     return matchColumn(cell, timestamp, typeByte);
   }
 
+  // If prevCell is a delete marker and cell is a delete marked Put or delete 
marker,
+  // it means the cell is deleted effectively.
+  // And we can do SEEK_NEXT_COL.
+  private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell 
prevCell) {
+    if (
+      prevCell != null && canOptimizeReadDeleteMarkers()
+        && CellUtil.matchingRowColumn(prevCell, cell) && 
CellUtil.matchingTimestamp(prevCell, cell)
+        && (PrivateCellUtil.isDeleteType(prevCell)
+          || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+    ) {
+      return MatchCode.SEEK_NEXT_COL;
+    }
+    return null;
+  }
+
+  private boolean canOptimizeReadDeleteMarkers() {
+    // for simplicity, optimization works only for these cases
+    return !seePastDeleteMarkers && scanMaxVersions == 1 && 
!visibilityLabelEnabled
+      && getFilter() == null && !(deletes instanceof 
NewVersionBehaviorTracker);
+  }

Review Comment:
   > Do we need to skip the optimization when KEEP_DELETED_CELL is true?
   
   To make the implementation simple, I decided not to optimize when 
KEEP_DELETED_CELL is true. And I have never used KEEP_DELETED_CELL as true ever 
before. If the optimization is needed in some other cases, I think it can be 
done later by another PR based on this PR.
   
   > What is the downside of applying this PR optimization?
   
   I think there is no penalty with this optimization, it just adds a little 
computational burden to check the conditions SEEK_NEXT_COL can be done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

Reply via email to