Re: [PR] HDDS-14038. Optimize SnapshotDiff by removing intermediate tables [ozone]

via GitHub Tue, 17 Feb 2026 15:22:07 -0800


Copilot commented on code in PR #9399:
URL: https://github.com/apache/ozone/pull/9399#discussion_r2819541517



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -1231,7 +1262,7 @@ private SnapshotDiffResponse 
submitJob(SnapshotDiffManager diffManager,
                                          String toSnapshotName) {
     try {
       return diffManager.getSnapshotDiffReport(VOLUME_NAME, BUCKET_NAME,
-          fromSnapshotName, toSnapshotName, 0, 1000, false, false);
+          fromSnapshotName, toSnapshotName, "10", 1000, false, false);

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.
   ```suggestion
             fromSnapshotName, toSnapshotName, "", 1000, false, false);
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -588,16 +643,12 @@ Pair<List<DiffReportEntry>, String> createPageResponse(
    */
   @VisibleForTesting
   void checkReportsIntegrity(final SnapshotDiffJob diffJob,
-                             final int pageStartIdx,
-                             final int numberOfEntriesInPage)
-      throws IOException {
-    if ((pageStartIdx >= diffJob.getTotalDiffEntries() &&
-        numberOfEntriesInPage != 0) || (pageStartIdx <
-        diffJob.getTotalDiffEntries() && numberOfEntriesInPage == 0)) {
-      LOG.error("Expected TotalDiffEntries: {} but found " +
-              "TotalDiffEntries: {}",
-          diffJob.getTotalDiffEntries(),
-          pageStartIdx + numberOfEntriesInPage);
+                             final String largestPageIndex,
+                             boolean lastPage) throws IOException {
+    // For last page check last entry returned if the largest entry key equals 
the largest key stored in the job entry.
+    if (lastPage && (diffJob.getLargestEntryKey() == null || 
!diffJob.getLargestEntryKey().equals(largestPageIndex))) {

Review Comment:
   The integrity check may fail incorrectly when the last page is empty. If no 
entries are fetched on the last page (iterator has no more entries from the 
start), `pageLastKey` remains as an empty string (initialized on line 620), 
while `diffJob.getLargestEntryKey()` would be a non-null valid key. This 
mismatch would cause the integrity check to fail and mark the job as FAILED 
incorrectly. Consider checking if any items were fetched before performing the 
integrity check, or update the logic to handle the case where the page is empty.



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -1324,7 +1355,7 @@ public void testGetSnapshotDiffReportJob() throws 
Exception {
     for (int i = 0; i < snapshotInfoList.size(); i++) {
       SnapshotDiffResponse snapshotDiffReport =
           spy.getSnapshotDiffReport(VOLUME_NAME, BUCKET_NAME,
-              snapshotInfo.getName(), snapshotInfoList.get(i).getName(), 0,
+              snapshotInfo.getName(), snapshotInfoList.get(i).getName(), "10",

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.
   ```suggestion
                 snapshotInfo.getName(), snapshotInfoList.get(i).getName(), "",
   ```



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -791,7 +822,7 @@ public void testGetSnapshotDiffReportForCancelledJob() 
throws IOException {
 
     // Response should still be cancelled.
     snapshotDiffResponse = spy.getSnapshotDiffReport(volumeName, bucketName,
-        fromSnapshotName, toSnapshotName, 0, 0, false, false);
+        fromSnapshotName, toSnapshotName, "10", 0, false, false);

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -766,7 +797,7 @@ public void testGetSnapshotDiffReportForCancelledJob() 
throws IOException {
     // Submit a new job.
     SnapshotDiffResponse snapshotDiffResponse =
         spy.getSnapshotDiffReport(volumeName, bucketName, fromSnapshotName,
-            toSnapshotName, 0, 0, false, false);
+            toSnapshotName, "10", 0, false, false);

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -932,7 +963,7 @@ public void testListSnapshotDiffJobs(String jobStatus,
     // SnapshotDiffReport
     SnapshotDiffResponse snapshotDiffResponse =
         spy.getSnapshotDiffReport(volumeName, bucketName, fromSnapshotName,
-            toSnapshotName, 0, 0, false, false);
+            toSnapshotName, "10", 0, false, false);

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.
   ```suggestion
               toSnapshotName, "", 0, false, false);
   ```



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -979,7 +1010,7 @@ public void testListSnapDiffWithInvalidStatus() throws 
IOException {
         eq(toSnapshotName), eq(false), eq(false));
 
     spy.getSnapshotDiffReport(volumeName, bucketName, fromSnapshotName,
-            toSnapshotName, 0, 0, false, false);
+            toSnapshotName, "10", 0, false, false);

Review Comment:
   The test is passing "10" as a token parameter, which doesn't match the new 
token format. According to the new implementation, tokens should be formatted 
as `<diffTypePrefix><20-digit-padded-index>`. For example, a valid token would 
be "100000000000000000000" (DELETE type, index 0) or "200000000000000000005" 
(RENAME type, index 5). The token "10" is only 2 characters and doesn't follow 
the 21-character format (1 prefix + 20 padded digits). This should be updated 
to use a properly formatted token or an empty string if starting from the 
beginning.
   ```suggestion
               toSnapshotName, "", 0, false, false);
   ```



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java:
##########
@@ -705,26 +732,30 @@ public void testCreatePageResponse(int startIdx,
     db.get().put(snapDiffJobTable,
         codecRegistry.asRawData(testJobId2),
         codecRegistry.asRawData(snapshotDiffJob2));
+    for (int pageSize = -1; pageSize <= totalNumberOfRecords; pageSize++) {
+      if (pageSize <= 0) {
+        int finalPageSize = pageSize;
+        assertThrows(IOException.class,
+            () -> snapshotDiffManager.createPageResponse(snapshotDiffJob, 
"vol",
+                "buck", "fs", "ts", pageMap.get(startIdx), finalPageSize));
+        continue;
+      }
+      SnapshotDiffReportOzone snapshotDiffReportOzone =
+          snapshotDiffManager.createPageResponse(snapshotDiffJob, "vol",
+              "buck", "fs", "ts",
+              pageMap.get(startIdx), pageSize);

Review Comment:
   The test is using `pageMap.get(startIdx)` to retrieve the token, but this 
retrieves the token for the entry at position `startIdx` in the original 
unsorted list, not the sorted list. Since entries are sorted by diff type 
(lines 711-716), the order in the database is different from the original 
order. The test should use `pageMap.get(indexList.get(startIdx))` to retrieve 
the token for the entry at position `startIdx` in the sorted list, which 
matches the actual order of entries in the database.
   ```suggestion
                   "buck", "fs", "ts", pageMap.get(indexList.get(startIdx)), 
finalPageSize));
           continue;
         }
         SnapshotDiffReportOzone snapshotDiffReportOzone =
             snapshotDiffManager.createPageResponse(snapshotDiffJob, "vol",
                 "buck", "fs", "ts",
                 pageMap.get(indexList.get(startIdx)), pageSize);
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -954,24 +954,18 @@ public int getInFlightSnapshotCount() {
     return inFlightSnapshotCount.get();
   }
 
-  private int getIndexFromToken(final String token) throws IOException {
+  private String validateToken(final String token) throws IOException {
     if (isBlank(token)) {
-      return 0;
+      return "";
     }
 
-    // Validate that token passed in the request is valid integer as of now.
-    // Later we can change it if we migrate to encrypted or cursor token.
-    try {
-      int index = Integer.parseInt(token);
-      if (index < 0) {
-        throw new IOException("Passed token is invalid. Resend the request " +
-            "with valid token returned in previous request.");
-      }
-      return index;
-    } catch (NumberFormatException exception) {
+    // Validate that the token passed in the request matches the expected 
format:
+    // <diffTypePrefix><20-digit-padded-index>. Update this logic if the token 
format changes in the future.
+    if (!token.matches("[0-9]+")) {

Review Comment:
   The token validation regex is too permissive and doesn't enforce the new 
token format. Tokens should follow the format 
`<diffTypePrefix><20-digit-padded-index>` where diffTypePrefix is "1", "2", 
"3", or "4", followed by exactly 20 digits. The current regex `[0-9]+` would 
accept invalid tokens like "10" or "43". The regex should be changed to 
`[1-4][0-9]{20}` to properly validate the expected format of 21 total 
characters.
   ```suggestion
       if (!token.matches("[1-4][0-9]{20}")) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14038. Optimize SnapshotDiff by removing intermediate tables [ozone]

Reply via email to