Re: [PR] HDDS-14245. Optimize search for deleted range using prefix count index structure [ozone]

via GitHub Mon, 02 Feb 2026 08:22:32 -0800


Copilot commented on code in PR #9690:
URL: https://github.com/apache/ozone/pull/9690#discussion_r2755166346



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RangeQueryIndex.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.utils.db;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.TreeMap;
+
+/**
+ * An index for answering "does this point fall within any of these ranges?" 
efficiently.
+ *
+ * <p>The indexed ranges are <em>half-open</em> intervals of the form
+ * {@code [startInclusive, endExclusive)}.
+ *
+ * <p><strong>Core idea (sweep-line / prefix-sum over range 
boundaries):</strong>
+ * Instead of scanning every range on each query, this index stores a sorted 
map from
+ * boundary points to a running count of "active" ranges at that point.
+ *
+ * <ul>
+ *   <li>For each range {@code [s, e)}, we add a delta {@code +1} at {@code s} 
and a delta
+ *       {@code -1} at {@code e}.</li>
+ *   <li>We then convert the deltas into a prefix sum in key order, so every 
boundary key
+ *       stores the number of ranges active at that coordinate.</li>
+ *   <li>For any query point {@code k}, the active count is {@code 
floorEntry(k).value}.
+ *       If it is {@code > 0}, then {@code k} intersects at least one 
range.</li>
+ * </ul>
+ *
+ * <p><strong>Update model:</strong> this index supports only removing ranges 
that were part of the
+ * initial set. Removal updates the prefix sums for keys in {@code 
[startInclusive, endExclusive)}
+ * (net effect of removing {@code +1} at start and {@code -1} at end).
+ *
+ * <p><strong>Complexities:</strong>
+ * <ul>
+ *   <li>Build: {@code O(R log B)} where {@code R} is #ranges and {@code B} is 
#distinct boundaries.</li>
+ *   <li>{@link #containsIntersectingRange(Comparable)} (Object)}: {@code 
O(log B)}.</li>

Review Comment:
   The complexity documentation contains a syntax error. The line states 
"{@link #containsIntersectingRange(Comparable)} (Object)}" but the "(Object)" 
appears to be a typo and should be removed. The correct format should be just 
"{@link #containsIntersectingRange(Comparable)}".
   ```suggestion
    *   <li>{@link #containsIntersectingRange(Comparable)}: {@code O(log 
B)}.</li>
   ```



##########
hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBBatchOperation.java:
##########
@@ -51,10 +58,19 @@
 import org.rocksdb.RocksDBException;
 
 /**
- * The TestRDBBatchOperation class provides test cases to validate the 
functionality of RDB batch operations
- * in a RocksDB-based backend. It verifies the correct behavior of write 
operations using batch processing
- * and ensures the integrity of operations like put and delete when performed 
in batch mode.
- */
+ * Test class for verifying batch operations with delete ranges using the
+ * RDBBatchOperation and MockedConstruction of ManagedWriteBatch.
+ *
+ * This test class includes:
+ * - Mocking and tracking of operations including put, delete, and delete range
+ *   within a batch operation.
+ * - Validation of committed operations using assertions on collected data.
+ * - Ensures that the batch operation interacts correctly with the
+ *   RocksDatabase and ColumnFamilyHandle components.
+ *
+ * The test method includes:
+ * 1. Setup of mocked ColumnFamilyHandle and RocksDatabase.ColumnFamily.
+ * 2. Mocking of methods to track operations performed on*/

Review Comment:
   The JavaDoc comment for the TestRDBBatchOperation class is incomplete. The 
comment abruptly ends mid-sentence at "2. Mocking of methods to track 
operations performed on". This should be completed to provide a full 
description of what the test class does.
   ```suggestion
    * 2. Mocking of methods to track operations performed on the
    *    ManagedWriteBatch and verification of the tracked operations after
    *    the batch is committed.
    */
   ```



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RangeQueryIndex.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.utils.db;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.TreeMap;
+
+/**
+ * An index for answering "does this point fall within any of these ranges?" 
efficiently.
+ *
+ * <p>The indexed ranges are <em>half-open</em> intervals of the form
+ * {@code [startInclusive, endExclusive)}.
+ *
+ * <p><strong>Core idea (sweep-line / prefix-sum over range 
boundaries):</strong>
+ * Instead of scanning every range on each query, this index stores a sorted 
map from
+ * boundary points to a running count of "active" ranges at that point.
+ *
+ * <ul>
+ *   <li>For each range {@code [s, e)}, we add a delta {@code +1} at {@code s} 
and a delta
+ *       {@code -1} at {@code e}.</li>
+ *   <li>We then convert the deltas into a prefix sum in key order, so every 
boundary key
+ *       stores the number of ranges active at that coordinate.</li>
+ *   <li>For any query point {@code k}, the active count is {@code 
floorEntry(k).value}.
+ *       If it is {@code > 0}, then {@code k} intersects at least one 
range.</li>
+ * </ul>
+ *
+ * <p><strong>Update model:</strong> this index supports only removing ranges 
that were part of the
+ * initial set. Removal updates the prefix sums for keys in {@code 
[startInclusive, endExclusive)}
+ * (net effect of removing {@code +1} at start and {@code -1} at end).
+ *
+ * <p><strong>Complexities:</strong>
+ * <ul>
+ *   <li>Build: {@code O(R log B)} where {@code R} is #ranges and {@code B} is 
#distinct boundaries.</li>
+ *   <li>{@link #containsIntersectingRange(Comparable)} (Object)}: {@code 
O(log B)}.</li>
+ *   <li>{@link #removeRange(Range)}: {@code O(log B + K)} where {@code K} is 
#boundaries in the range.</li>
+ * </ul>
+ *
+ * @param <T> boundary type (must be {@link Comparable} to be stored in a 
{@link TreeMap})
+ */
+class RangeQueryIndex<T extends Comparable<T>> {
+
+  private final TreeMap<T, Integer> rangeCountIndexMap;
+  private final Set<Range<T>> ranges;
+
+  RangeQueryIndex(Set<Range<T>> ranges) {
+    this.rangeCountIndexMap = new TreeMap<>();
+    this.ranges = ranges;
+    init();
+  }
+
+  private void init() {
+    // Phase 1: store boundary deltas (+1 at start, -1 at end).
+    for (Range<T> range : ranges) {
+      rangeCountIndexMap.compute(range.startInclusive, (k, v) -> v == null ? 1 
: v + 1);
+      rangeCountIndexMap.compute(range.endExclusive, (k, v) -> v == null ? -1 
: v - 1);
+    }
+
+    // Phase 2: convert deltas to prefix sums so each key holds the active 
range count at that coordinate.
+    int totalCount = 0;
+    for (Map.Entry<T, Integer> entry : rangeCountIndexMap.entrySet()) {
+      totalCount += entry.getValue();
+      entry.setValue(totalCount);
+    }
+  }
+
+  /**
+   * Remove a range from the index.
+   *
+   * <p>This method assumes the range set is "popped" over time (ranges are 
removed but not added).
+   * Internally, removing {@code [s, e)} decreases the active count by 1 for 
all boundary keys in
+   * {@code [s, e)} and leaves counts outside the range unchanged.
+   *
+   * @throws IOException if the given {@code range} is not part of the indexed 
set
+   */
+  void removeRange(Range<T> range) throws IOException {
+    if (!ranges.contains(range)) {
+      throw new IOException(String.format("Range %s not found in index 
structure : %s", range, ranges));
+    }
+    ranges.remove(range);
+    for (Map.Entry<T, Integer> entry : 
rangeCountIndexMap.subMap(range.startInclusive, true,
+        range.endExclusive, false).entrySet()) {
+      entry.setValue(entry.getValue() - 1);
+    }
+  }
+
+  /**
+   * @return true iff {@code key} is contained in at least one indexed range.
+   *
+   * <p>Implementation detail: uses {@link TreeMap#floorEntry(Object)} to find 
the last boundary
+   * at or before {@code key}, and checks the prefix-summed active count at 
that point.</p>
+   */
+  boolean containsIntersectingRange(T key) {
+    Map.Entry<T, Integer> countEntry = rangeCountIndexMap.floorEntry(key);
+    if (countEntry == null) {
+      return false;
+    }
+    return countEntry.getValue() > 0;
+  }
+
+  /**
+   * Returns an intersecting range containing {@code key}, if any.
+   *
+   * <p>This method first checks {@link 
#containsIntersectingRange(Comparable)} using the index;
+   * if the count indicates an intersection exists, it then scans the backing 
{@link #ranges}
+   * set to find a concrete {@link Range} that contains {@code key}.</p>
+   *
+   * <p>Note that because {@link #ranges} is a {@link Set}, "first" refers to 
whatever iteration
+   * order that set provides (it is not guaranteed to be deterministic unless 
the provided set is).</p>
+   *
+   * @return a containing range, or null if none intersect
+   */
+  Range<T> getFirstIntersectingRange(T key) {
+    Map.Entry<T, Integer> countEntry = rangeCountIndexMap.floorEntry(key);
+    if (countEntry == null) {
+      return null;
+    }
+    for (Range<T> range : ranges) {
+      if (range.contains(key)) {
+        return range;
+      }
+    }
+    return null;
+  }
+
+  /**
+   * A half-open interval {@code [startInclusive, endExclusive)}.
+   *
+   * <p>For a value {@code k} to be contained, it must satisfy:
+   * {@code startInclusive <= k < endExclusive} (according to {@link 
Comparable#compareTo(Object)}).</p>
+   */
+  static final class Range<T extends Comparable<T>> {
+    private final T startInclusive;
+    private final T endExclusive;
+
+    Range(T startInclusive, T endExclusive) {
+      this.startInclusive = Objects.requireNonNull(startInclusive, "start == 
null");
+      this.endExclusive = Objects.requireNonNull(endExclusive, "end == null");
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      return this == o;
+    }
+
+    @Override
+    public int hashCode() {
+      return Objects.hash(startInclusive, endExclusive);

Review Comment:
   There is a violation of the equals-hashCode contract in the Range class. The 
equals method uses identity comparison (this == o), while hashCode computes the 
hash based on the values of startInclusive and endExclusive. This means two 
Range instances with identical bounds will have the same hashCode but will not 
be equal according to equals, which violates the Java contract that states: "If 
two objects are equal according to the equals method, then calling hashCode on 
each of them must produce the same integer result." While this appears to be 
intentional based on the test cases (e.g., 
testDuplicateSameBoundsDifferentInstances), it could lead to unexpected 
behavior when using Range objects in hash-based collections and makes the class 
fragile. Consider either making equals value-based to match hashCode, or making 
hashCode identity-based (e.g., using System.identityHashCode) to match equals.
   ```suggestion
         return System.identityHashCode(this);
   ```



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RangeQueryIndex.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.utils.db;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.TreeMap;
+
+/**
+ * An index for answering "does this point fall within any of these ranges?" 
efficiently.
+ *
+ * <p>The indexed ranges are <em>half-open</em> intervals of the form
+ * {@code [startInclusive, endExclusive)}.
+ *
+ * <p><strong>Core idea (sweep-line / prefix-sum over range 
boundaries):</strong>
+ * Instead of scanning every range on each query, this index stores a sorted 
map from
+ * boundary points to a running count of "active" ranges at that point.
+ *
+ * <ul>
+ *   <li>For each range {@code [s, e)}, we add a delta {@code +1} at {@code s} 
and a delta
+ *       {@code -1} at {@code e}.</li>
+ *   <li>We then convert the deltas into a prefix sum in key order, so every 
boundary key
+ *       stores the number of ranges active at that coordinate.</li>
+ *   <li>For any query point {@code k}, the active count is {@code 
floorEntry(k).value}.
+ *       If it is {@code > 0}, then {@code k} intersects at least one 
range.</li>
+ * </ul>
+ *
+ * <p><strong>Update model:</strong> this index supports only removing ranges 
that were part of the
+ * initial set. Removal updates the prefix sums for keys in {@code 
[startInclusive, endExclusive)}
+ * (net effect of removing {@code +1} at start and {@code -1} at end).
+ *
+ * <p><strong>Complexities:</strong>
+ * <ul>
+ *   <li>Build: {@code O(R log B)} where {@code R} is #ranges and {@code B} is 
#distinct boundaries.</li>
+ *   <li>{@link #containsIntersectingRange(Comparable)} (Object)}: {@code 
O(log B)}.</li>
+ *   <li>{@link #removeRange(Range)}: {@code O(log B + K)} where {@code K} is 
#boundaries in the range.</li>
+ * </ul>
+ *
+ * @param <T> boundary type (must be {@link Comparable} to be stored in a 
{@link TreeMap})
+ */
+class RangeQueryIndex<T extends Comparable<T>> {
+
+  private final TreeMap<T, Integer> rangeCountIndexMap;
+  private final Set<Range<T>> ranges;
+
+  RangeQueryIndex(Set<Range<T>> ranges) {
+    this.rangeCountIndexMap = new TreeMap<>();
+    this.ranges = ranges;
+    init();
+  }
+
+  private void init() {
+    // Phase 1: store boundary deltas (+1 at start, -1 at end).
+    for (Range<T> range : ranges) {
+      rangeCountIndexMap.compute(range.startInclusive, (k, v) -> v == null ? 1 
: v + 1);
+      rangeCountIndexMap.compute(range.endExclusive, (k, v) -> v == null ? -1 
: v - 1);
+    }
+
+    // Phase 2: convert deltas to prefix sums so each key holds the active 
range count at that coordinate.
+    int totalCount = 0;
+    for (Map.Entry<T, Integer> entry : rangeCountIndexMap.entrySet()) {
+      totalCount += entry.getValue();
+      entry.setValue(totalCount);
+    }
+  }
+
+  /**
+   * Remove a range from the index.
+   *
+   * <p>This method assumes the range set is "popped" over time (ranges are 
removed but not added).
+   * Internally, removing {@code [s, e)} decreases the active count by 1 for 
all boundary keys in
+   * {@code [s, e)} and leaves counts outside the range unchanged.
+   *
+   * @throws IOException if the given {@code range} is not part of the indexed 
set
+   */
+  void removeRange(Range<T> range) throws IOException {
+    if (!ranges.contains(range)) {
+      throw new IOException(String.format("Range %s not found in index 
structure : %s", range, ranges));
+    }
+    ranges.remove(range);
+    for (Map.Entry<T, Integer> entry : 
rangeCountIndexMap.subMap(range.startInclusive, true,
+        range.endExclusive, false).entrySet()) {
+      entry.setValue(entry.getValue() - 1);
+    }
+  }
+
+  /**
+   * @return true iff {@code key} is contained in at least one indexed range.
+   *
+   * <p>Implementation detail: uses {@link TreeMap#floorEntry(Object)} to find 
the last boundary
+   * at or before {@code key}, and checks the prefix-summed active count at 
that point.</p>
+   */
+  boolean containsIntersectingRange(T key) {
+    Map.Entry<T, Integer> countEntry = rangeCountIndexMap.floorEntry(key);
+    if (countEntry == null) {
+      return false;
+    }
+    return countEntry.getValue() > 0;
+  }
+
+  /**
+   * Returns an intersecting range containing {@code key}, if any.
+   *
+   * <p>This method first checks {@link 
#containsIntersectingRange(Comparable)} using the index;
+   * if the count indicates an intersection exists, it then scans the backing 
{@link #ranges}
+   * set to find a concrete {@link Range} that contains {@code key}.</p>
+   *
+   * <p>Note that because {@link #ranges} is a {@link Set}, "first" refers to 
whatever iteration
+   * order that set provides (it is not guaranteed to be deterministic unless 
the provided set is).</p>
+   *
+   * @return a containing range, or null if none intersect
+   */
+  Range<T> getFirstIntersectingRange(T key) {
+    Map.Entry<T, Integer> countEntry = rangeCountIndexMap.floorEntry(key);
+    if (countEntry == null) {
+      return null;
+    }
+    for (Range<T> range : ranges) {
+      if (range.contains(key)) {
+        return range;
+      }
+    }
+    return null;

Review Comment:
   The documentation for getFirstIntersectingRange states that "This method 
first checks containsIntersectingRange using the index; if the count indicates 
an intersection exists, it then scans the backing ranges set", but the 
implementation doesn't actually verify that countEntry.getValue() > 0 before 
scanning the ranges set. This could lead to unnecessary O(n) scanning of the 
ranges set even when the index indicates no intersecting ranges exist. Consider 
adding a check like "if (countEntry.getValue() <= 0) { return null; }" after 
line 133 to match the documented behavior and improve performance.



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java:
##########
@@ -332,6 +468,11 @@ void delete(CodecBuffer key) {
         overwriteIfExists(new DeleteOp(key));
       }
 
+      void deleteRange(byte[] startKey, byte[] endKey) {
+        delRangeCount++;
+        ops.put(opIndex.getAndIncrement(), new DeleteRangeOperation(startKey, 
endKey));

Review Comment:
   The deleteRange method doesn't update the batchSize field when adding a 
DeleteRangeOperation, unlike the put and delete methods which call 
overwriteIfExists and update batchSize accordingly. This inconsistency means 
that batch size tracking will be incomplete, potentially affecting batch 
operation monitoring and logging. Consider adding "batchSize += new 
DeleteRangeOperation(startKey, endKey).totalLength();" or restructuring to 
ensure batch size is properly tracked for delete range operations.
   ```suggestion
           DeleteRangeOperation op = new DeleteRangeOperation(startKey, endKey);
           batchSize += op.totalLength();
           ops.put(opIndex.getAndIncrement(), op);
   ```



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java:
##########
@@ -239,6 +251,40 @@ boolean closeImpl() {
     }
   }
 
+  /**
+   * Delete range operation to be applied to a {@link ColumnFamily} batch.
+   */
+  private final class DeleteRangeOperation extends Op {

Review Comment:
   DeleteRangeOperation should be made static, since the enclosing instance is 
not used.
   ```suggestion
     private static final class DeleteRangeOperation extends Op {
   ```



##########
pom.xml:
##########
@@ -198,7 +198,7 @@
     <protobuf.version>3.25.8</protobuf.version>
     <ranger.version>2.7.0</ranger.version>
     <!-- versions included in ratis-thirdparty, update in sync -->
-    <ratis-thirdparty.grpc.version>1.75.0</ratis-thirdparty.grpc.version>
+    <ratis-thirdparty.grpc.version>1.77.0</ratis-thirdparty.grpc.version>

Review Comment:
   The gRPC version is being updated from 1.75.0 to 1.77.0, but this change is 
not mentioned in the PR description which focuses on optimizing search for 
deleted ranges. If this version bump is intentional and related to the changes 
in this PR, please update the PR description to explain why this dependency 
update is needed. If it's unrelated, consider moving it to a separate PR for 
clarity and to keep changes focused on a single purpose.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14245. Optimize search for deleted range using prefix count index structure [ozone]

Reply via email to