amogh-jahagirdar commented on code in PR #13614:
URL: https://github.com/apache/iceberg/pull/13614#discussion_r2229991194


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1829,6 +1879,91 @@ public void testRemoveSnapshotsNoOp() throws Exception {
         .isSameAs(current);
   }
 
+  @TestTemplate
+  public void testFileCleanupOnAllRefsAgedOff() {
+    table.newAppend().appendFile(FILE_A).commit();
+    Set<String> expectedDeleteFiles =
+        ImmutableSet.of(table.currentSnapshot().manifestListLocation());
+    String tag = "tag";
+    long tagAgeMs = 20;
+    table
+        .manageSnapshots()
+        .createTag(tag, table.currentSnapshot().snapshotId())
+        .setMaxRefAgeMs(tag, tagAgeMs)
+        .commit();
+    long currentTime = System.currentTimeMillis();
+    table.newAppend().appendFile(FILE_B).appendFile(FILE_C).commit();
+
+    waitUntilAfter(currentTime + tagAgeMs);
+
+    Set<String> deletedFiles = Sets.newHashSet();
+    removeSnapshots(table)
+        .cleanExpiredFiles(true)
+        .expireOlderThan(System.currentTimeMillis())
+        .deleteWith(deletedFiles::add)
+        .commit();
+    assertThat(deletedFiles).isEqualTo(expectedDeleteFiles);
+  }
+
+  @TestTemplate
+  public void testCannotIncrementallyCleanupBranchBeforeExpiration() {
+    assumeThat(incrementalCleanup).isTrue();
+
+    table.newAppend().appendFile(FILE_A).commit();
+    String branch = "test";
+    long branchAgeMs = 20;
+    table
+        .manageSnapshots()
+        .createBranch(branch)
+        .setMaxRefAgeMs(branch, branchAgeMs)
+        .setMinSnapshotsToKeep(branch, 1)
+        .commit();
+    table.newDelete().deleteFile(FILE_A).commit();
+
+    Set<String> deletedFiles = Sets.newHashSet();
+    assertThatThrownBy(
+            () ->
+                removeSnapshots(table)
+                    .deleteWith(deletedFiles::add)
+                    .expireOlderThan(System.currentTimeMillis())
+                    .commit())
+        .isInstanceOf(UnsupportedOperationException.class)
+        .hasMessage(
+            "Cannot incrementally clean files when metadata before expiration 
has other branches");
+  }
+
+  @TestTemplate
+  public void testReachableCleanupWhenBranchAgedOff() {
+    assumeThat(incrementalCleanup).isFalse();
+
+    table.newAppend().appendFile(FILE_A).commit();
+    String branch = "test";
+    long branchAgeMs = 20;
+    table
+        .manageSnapshots()
+        .createBranch(branch)
+        .setMaxRefAgeMs(branch, branchAgeMs)
+        .setMinSnapshotsToKeep(branch, 1)
+        .commit();
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot latestTestSnapshot = table.snapshot(branch);
+    long currentTime = System.currentTimeMillis();
+    waitUntilAfter(currentTime + branchAgeMs);
+
+    Set<String> expectedDeletedFiles =
+        ImmutableSet.of(
+            latestTestSnapshot.manifestListLocation(),
+            
Iterables.getOnlyElement(latestTestSnapshot.allManifests(table.io())).path(),
+            FILE_A.location());

Review Comment:
   I've updated the algorithm but this test case will still fail for the 
incremental case since the incremental cleanup doesn't cleanup FILE_A when it 
should be able to. 
   
   
   Incremental should be able to apply for this case because the commit graph 
looks like:
   
   Before expiration:
                           
   ```
   Snapshot with FILE_A --- Snapshot with FILE_A_REMOVED  (main, branch)
   
   ```
   
   After expiration:
                           
   ```
   Snapshot with FILE_A_REMOVED  (main)
   ```
   
   In the incremental case only the manifest list and manifest from the first 
snapshot are removed. Reachability cleans all 3 files as expected.
   
   I think this problem is independent of the branching, even with just 
removing the branching from the test and just committing to main, FILE_A still 
doesn't end up getting cleaned up when I would expect it to be cleaned up since 
the new manifest would have a DELETED entry for it, and the file is orphaned.
   
   I also confirmed that this is independent of my changes here, the issue can 
be reproduced in my local master branch.
   
   I'll need to dig into this a bit more, but I think it's something missing in 
the incremental cleanup logic to detect the ability to cleanup FILE_A. cc 
@rdblue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to