Re: [PR] HDDS-14038. Optimize SnapshotDiff by removing intermediate tables [ozone]

via GitHub Mon, 08 Dec 2025 16:22:24 -0800


Copilot commented on code in PR #9399:
URL: https://github.com/apache/ozone/pull/9399#discussion_r2600605516



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -982,24 +982,18 @@ public int getInFlightSnapshotCount() {
     return inFlightSnapshotCount.get();
   }
 
-  private int getIndexFromToken(final String token) throws IOException {
+  private String validateToken(final String token) throws IOException {
     if (isBlank(token)) {
-      return 0;
+      return "";
     }
 
     // Validate that token passed in the request is valid integer as of now.
     // Later we can change it if we migrate to encrypted or cursor token.
-    try {
-      int index = Integer.parseInt(token);
-      if (index < 0) {
-        throw new IOException("Passed token is invalid. Resend the request " +
-            "with valid token returned in previous request.");
-      }
-      return index;
-    } catch (NumberFormatException exception) {
+    if (!token.matches("[0-9]+")) {

Review Comment:
   The token validation regex `[0-9]+` is too permissive for the new token 
format. Tokens should now follow the format 
`<diffTypePrefix><20-digit-padded-index>` where diffTypePrefix is "1", "2", 
"3", or "4", followed by exactly 20 digits. The current validation would accept 
any sequence of digits like "43" which doesn't match the required format. 
Consider validating the token format more strictly, e.g., `[1-4][0-9]{20}`.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -591,16 +636,12 @@ Pair<List<DiffReportEntry>, String> createPageResponse(
    */
   @VisibleForTesting
   void checkReportsIntegrity(final SnapshotDiffJob diffJob,
-                             final int pageStartIdx,
-                             final int numberOfEntriesInPage)
-      throws IOException {
-    if ((pageStartIdx >= diffJob.getTotalDiffEntries() &&
-        numberOfEntriesInPage != 0) || (pageStartIdx <
-        diffJob.getTotalDiffEntries() && numberOfEntriesInPage == 0)) {
-      LOG.error("Expected TotalDiffEntries: {} but found " +
-              "TotalDiffEntries: {}",
-          diffJob.getTotalDiffEntries(),
-          pageStartIdx + numberOfEntriesInPage);
+                             final String largestPageIndex,
+                             boolean lastPage) throws IOException {
+    // For last page check last entry returned if the largest entry key equals 
the largest key stored in the job entry.
+    if (lastPage && !diffJob.getLargestEntryKey().equals(largestPageIndex)) {

Review Comment:
   Potential NullPointerException: `diffJob.getLargestEntryKey()` could be null 
(e.g., when job is created but not yet completed, or when totalDiffEntries is 
0), but the code calls `.equals()` on it without a null check. This would cause 
an NPE. Consider adding a null check: `if (lastPage && 
(diffJob.getLargestEntryKey() == null || 
!diffJob.getLargestEntryKey().equals(largestPageIndex)))`.
   ```suggestion
       if (lastPage && (diffJob.getLargestEntryKey() == null || 
!diffJob.getLargestEntryKey().equals(largestPageIndex))) {
   ```



##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestOmSnapshot.java:
##########
@@ -1436,7 +1436,7 @@ public void testSnapDiff() throws Exception {
 
     IOException ioException = assertThrows(IOException.class,
         () -> store.snapshotDiff(volume, bucket, snap6,
-            snap7, "3", 0, forceFullSnapshotDiff, disableNativeDiff));
+            snap7, "43", 0, forceFullSnapshotDiff, disableNativeDiff));

Review Comment:
   The token "43" does not match the new token format. Tokens should now be 
formatted as `<diffType><20-digit-padded-index>`. For MODIFY type (prefix "4") 
with index 3, the token should be "400000000000000000003" instead of "43". The 
test should use the properly formatted token.
   ```suggestion
               snap7, "400000000000000000003", 0, forceFullSnapshotDiff, 
disableNativeDiff));
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -982,24 +982,18 @@ public int getInFlightSnapshotCount() {
     return inFlightSnapshotCount.get();
   }
 
-  private int getIndexFromToken(final String token) throws IOException {
+  private String validateToken(final String token) throws IOException {
     if (isBlank(token)) {
-      return 0;
+      return "";
     }
 
     // Validate that token passed in the request is valid integer as of now.
     // Later we can change it if we migrate to encrypted or cursor token.

Review Comment:
   The comment states "Validate that token passed in the request is valid 
integer as of now" but the implementation has changed. Tokens are no longer 
simple integers - they now follow the format 
`<diffTypePrefix><20-digit-padded-index>`. The comment should be updated to 
reflect the new token format.
   ```suggestion
       // Validate that the token passed in the request matches the expected 
format:
       // <diffTypePrefix><20-digit-padded-index>. Update this logic if the 
token format changes in the future.
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -524,15 +565,21 @@ SnapshotDiffReportOzone createPageResponse(
       final String bucketName,
       final String fromSnapshotName,
       final String toSnapshotName,
-      final int index,
+      final String index,
       final int pageSize
   ) throws IOException {
-    if (index < 0 || index > snapDiffJob.getTotalDiffEntries()
-        || pageSize <= 0) {
+    if (!isBlank(index)) {
+      
DIFF_TYPE_STRING_MAP.values().stream().filter(index::startsWith).findFirst()
+          .orElseThrow(() -> new IOException("Token " + index + " has invalid 
prefix. Valid prefixes: "
+              + 
DIFF_TYPE_STRING_MAP.values().stream().map(String::valueOf).collect(Collectors.joining(","))));
+    }
+
+    int idx = isBlank(index) ? 0 : Integer.parseInt(index.substring(1));

Review Comment:
   Potential uncaught 'java.lang.NumberFormatException'.
   ```suggestion
       int idx;
       if (isBlank(index)) {
         idx = 0;
       } else {
         try {
           idx = Integer.parseInt(index.substring(1));
         } catch (NumberFormatException e) {
           throw new IOException("Token " + index + " has invalid numeric part: 
" +
               index.substring(1) + ". It should be a valid integer.", e);
         }
       }
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -548,39 +595,37 @@ SnapshotDiffReportOzone createPageResponse(
 
   Pair<List<DiffReportEntry>, String> createPageResponse(
       final SnapshotDiffJob snapDiffJob,
-      final int index,
+      final String index,
       final int pageSize
   ) throws IOException {
     List<DiffReportEntry> diffReportList = new ArrayList<>();
 
-    boolean hasMoreEntries = true;
-
     byte[] lowerIndex = codecRegistry.asRawData(getReportKeyForIndex(
         snapDiffJob.getJobId(), index));
-    byte[] upperIndex = codecRegistry.asRawData(getReportKeyForIndex(
-        snapDiffJob.getJobId(), index + pageSize));
-    int idx = index;
+    String highestPossiblePrefix =
+        
StringUtils.getLexicographicallyHigherString(DIFF_TYPE_STRING_MAP.values().stream()
+            .max(String::compareTo).get());
+    byte[] upperIndex = 
codecRegistry.asRawData(getReportKeyForIndex(snapDiffJob.getJobId(), 
highestPossiblePrefix));
     try (ClosableIterator<Map.Entry<byte[], byte[]>> iterator =
-             snapDiffReportTable.iterator(Optional.of(lowerIndex),
-                 Optional.of(upperIndex))) {
+             snapDiffReportTable.iterator(Optional.of(lowerIndex), 
Optional.of(upperIndex))) {
       int itemsFetched = 0;
+      String pageLastKey = "";
       while (iterator.hasNext() && itemsFetched < pageSize) {
         Map.Entry<byte[], byte[]> entry = iterator.next();
+        pageLastKey = StringCodec.get().fromPersistedFormat(entry.getKey());
         byte[] bytes = entry.getValue();
-        diffReportList.add(codecRegistry.asObject(bytes,
-            DiffReportEntry.class));
-        idx += 1;
+        diffReportList.add(codecRegistry.asObject(bytes, 
DiffReportEntry.class));
         itemsFetched += 1;
       }
-      if (diffReportList.size() < pageSize) {
-        hasMoreEntries = false;
-      }
+      // Next token
+      String nextTokenString = iterator.hasNext() ?
+          
getIndexFromReportKey(StringCodec.get().fromPersistedFormat(iterator.next().getKey()))
 : null;

Review Comment:
   Calling `iterator.next()` to peek at the next token will consume that 
element from the iterator, causing the next page to skip an entry. This is a 
critical pagination bug. The next token should be generated without consuming 
the iterator element. Consider storing the key from the last iteration, or 
using a different approach to determine if there are more pages (e.g., check if 
`itemsFetched < totalDiffEntries`).
   ```suggestion
         String nextTokenString = iterator.hasNext() ? 
getIndexFromReportKey(pageLastKey) : null;
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -1208,7 +1227,7 @@ long generateDiffReport(
            */
           byte[] oldKeyName = oldObjIdToKeyMap.get(id);
           byte[] newKeyName = newObjIdToKeyMap.get(id);
-
+          String jobKey = "";

Review Comment:
   The `jobKey` variable is declared but not initialized at line 1230, and may 
remain empty if no diff entry is added in certain branches (e.g., when `key` is 
null at lines 1238-1241, 1245-1248, or 1253-1256). This could lead to incorrect 
tracking of `largestJobKey` at line 1294. Consider initializing `jobKey` to 
empty string at the start of each iteration, or ensuring it's only compared 
when it has been set. For example: `String jobKey = ""` before the if-else 
chain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14038. Optimize SnapshotDiff by removing intermediate tables [ozone]

Reply via email to