Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

via GitHub Thu, 23 Jan 2025 01:34:00 -0800


EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1926611791



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws 
IOException {
       if (includeDeleteMarker) {
         this.deletes.add(cell);
       }
-      return MatchCode.SKIP;
+      // In some cases, optimization can not be done
+      if (!canOptimizeReadDeleteMarkers()) {
+        return MatchCode.SKIP;
+      }
     }
-    returnCode = checkDeleted(deletes, cell);
-    if (returnCode != null) {
+    // optimization when prevCell is Delete or DeleteFamilyVersion
+    if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+      return returnCode;
+    }
+    if ((returnCode = checkDeleted(deletes, cell)) != null) {
       return returnCode;
     }
     return matchColumn(cell, timestamp, typeByte);
   }
 
+  // If prevCell is a delete marker and cell is a delete marked Put or delete 
marker,
+  // it means the cell is deleted effectively.
+  // And we can do SEEK_NEXT_COL.
+  private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell 
prevCell) {
+    if (
+      prevCell != null && canOptimizeReadDeleteMarkers()
+        && CellUtil.matchingRowColumn(prevCell, cell) && 
CellUtil.matchingTimestamp(prevCell, cell)
+        && (PrivateCellUtil.isDeleteType(prevCell)
+          || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+    ) {
+      return MatchCode.SEEK_NEXT_COL;
+    }
+    return null;
+  }
+
+  private boolean canOptimizeReadDeleteMarkers() {
+    // for simplicity, optimization works only for these cases
+    return !seePastDeleteMarkers && scanMaxVersions == 1 && 
!visibilityLabelEnabled
+      && getFilter() == null && !(deletes instanceof 
NewVersionBehaviorTracker);
+  }

Review Comment:
   > > Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and 
see how perf numbers look like with and without your patch?
   > 
   > I have run the same tests again except KEEP_DELETED_CELL is set false.
   > 
   > master
   > 
   > 1. scan with dual file compaction enabled 4047ms
   > 2. scan with dual file compaction disabled 4277ms
   > 3. scan with delete markers and dual file compaction disabled 2.5031 
seconds
   > 4. scan with delete markers and dual file compaction enabled 0.0198 seconds
   > 
   > this PR
   > 
   > 1. scan with dual file compaction enabled 4134ms
   > 2. scan with dual file compaction disabled 3383ms
   > 3. scan with delete markers and dual file compaction disabled 3.4726 
seconds
   > 4. scan with delete markers and dual file compaction enabled 0.0245 seconds
   > 
   > It looks like there is some performance degradation on the result 3. I 
will dig into it.
   
   
https://github.com/EungsopYoo/hbase/blob/63901155caf5c226b02564128669234c08251e8d/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L81-L99
   
   The slight performance degradation is due to removing of early return of 
MatchCode.SKIP in the normal cases. Because of the removal of early return, the 
methods like `checkDeleted()`, `matchColumn()` are executed more than before, 
and then some burden of computation is added. I found the result by removing 
added code blocks one by one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

Reply via email to