amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2229991194
########## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ########## @@ -1829,6 +1879,91 @@ public void testRemoveSnapshotsNoOp() throws Exception { .isSameAs(current); } + @TestTemplate + public void testFileCleanupOnAllRefsAgedOff() { + table.newAppend().appendFile(FILE_A).commit(); + Set<String> expectedDeleteFiles = + ImmutableSet.of(table.currentSnapshot().manifestListLocation()); + String tag = "tag"; + long tagAgeMs = 20; + table + .manageSnapshots() + .createTag(tag, table.currentSnapshot().snapshotId()) + .setMaxRefAgeMs(tag, tagAgeMs) + .commit(); + long currentTime = System.currentTimeMillis(); + table.newAppend().appendFile(FILE_B).appendFile(FILE_C).commit(); + + waitUntilAfter(currentTime + tagAgeMs); + + Set<String> deletedFiles = Sets.newHashSet(); + removeSnapshots(table) + .cleanExpiredFiles(true) + .expireOlderThan(System.currentTimeMillis()) + .deleteWith(deletedFiles::add) + .commit(); + assertThat(deletedFiles).isEqualTo(expectedDeleteFiles); + } + + @TestTemplate + public void testCannotIncrementallyCleanupBranchBeforeExpiration() { + assumeThat(incrementalCleanup).isTrue(); + + table.newAppend().appendFile(FILE_A).commit(); + String branch = "test"; + long branchAgeMs = 20; + table + .manageSnapshots() + .createBranch(branch) + .setMaxRefAgeMs(branch, branchAgeMs) + .setMinSnapshotsToKeep(branch, 1) + .commit(); + table.newDelete().deleteFile(FILE_A).commit(); + + Set<String> deletedFiles = Sets.newHashSet(); + assertThatThrownBy( + () -> + removeSnapshots(table) + .deleteWith(deletedFiles::add) + .expireOlderThan(System.currentTimeMillis()) + .commit()) + .isInstanceOf(UnsupportedOperationException.class) + .hasMessage( + "Cannot incrementally clean files when metadata before expiration has other branches"); + } + + @TestTemplate + public void testReachableCleanupWhenBranchAgedOff() { + assumeThat(incrementalCleanup).isFalse(); + + table.newAppend().appendFile(FILE_A).commit(); + String branch = "test"; + long branchAgeMs = 20; + table + .manageSnapshots() + .createBranch(branch) + .setMaxRefAgeMs(branch, branchAgeMs) + .setMinSnapshotsToKeep(branch, 1) + .commit(); + table.newDelete().deleteFile(FILE_A).commit(); + Snapshot latestTestSnapshot = table.snapshot(branch); + long currentTime = System.currentTimeMillis(); + waitUntilAfter(currentTime + branchAgeMs); + + Set<String> expectedDeletedFiles = + ImmutableSet.of( + latestTestSnapshot.manifestListLocation(), + Iterables.getOnlyElement(latestTestSnapshot.allManifests(table.io())).path(), + FILE_A.location()); Review Comment: I've updated the algorithm but this test case will still fail for the incremental case since the incremental cleanup doesn't cleanup FILE_A when it should be able to. Incremental should be able to apply for this case because the commit graph looks like: Before expiration: ``` Snapshot with FILE_A --- Snapshot with FILE_A_REMOVED (main, branch) ``` After expiration: ``` Snapshot with FILE_A_REMOVED (main) ``` In the incremental case only the manifest list and manifest from the first snapshot are removed. I think this problem is independent of the branching, even with just removing the branching from the test and just committing to main, FILE_A still doesn't end up getting cleaned up when I would expect it to be cleaned up since the new manifest would have a DELETED entry for it, and the file is orphaned. I also confirmed that this is independent of my changes here, the issue can be reproduced in my local master branch. I'll need to dig into this a bit more, but I think it's something missing in the incremental cleanup logic to detect the ability to cleanup FILE_A. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org