amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2231902383
########## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ########## @@ -1829,6 +1879,91 @@ public void testRemoveSnapshotsNoOp() throws Exception { .isSameAs(current); } + @TestTemplate + public void testFileCleanupOnAllRefsAgedOff() { + table.newAppend().appendFile(FILE_A).commit(); + Set<String> expectedDeleteFiles = + ImmutableSet.of(table.currentSnapshot().manifestListLocation()); + String tag = "tag"; + long tagAgeMs = 20; + table + .manageSnapshots() + .createTag(tag, table.currentSnapshot().snapshotId()) + .setMaxRefAgeMs(tag, tagAgeMs) + .commit(); + long currentTime = System.currentTimeMillis(); + table.newAppend().appendFile(FILE_B).appendFile(FILE_C).commit(); + + waitUntilAfter(currentTime + tagAgeMs); + + Set<String> deletedFiles = Sets.newHashSet(); + removeSnapshots(table) + .cleanExpiredFiles(true) + .expireOlderThan(System.currentTimeMillis()) + .deleteWith(deletedFiles::add) + .commit(); + assertThat(deletedFiles).isEqualTo(expectedDeleteFiles); + } + + @TestTemplate + public void testCannotIncrementallyCleanupBranchBeforeExpiration() { + assumeThat(incrementalCleanup).isTrue(); + + table.newAppend().appendFile(FILE_A).commit(); + String branch = "test"; + long branchAgeMs = 20; + table + .manageSnapshots() + .createBranch(branch) + .setMaxRefAgeMs(branch, branchAgeMs) + .setMinSnapshotsToKeep(branch, 1) + .commit(); + table.newDelete().deleteFile(FILE_A).commit(); + + Set<String> deletedFiles = Sets.newHashSet(); + assertThatThrownBy( + () -> + removeSnapshots(table) + .deleteWith(deletedFiles::add) + .expireOlderThan(System.currentTimeMillis()) + .commit()) + .isInstanceOf(UnsupportedOperationException.class) + .hasMessage( + "Cannot incrementally clean files when metadata before expiration has other branches"); + } + + @TestTemplate + public void testReachableCleanupWhenBranchAgedOff() { + assumeThat(incrementalCleanup).isFalse(); + + table.newAppend().appendFile(FILE_A).commit(); + String branch = "test"; + long branchAgeMs = 20; + table + .manageSnapshots() + .createBranch(branch) + .setMaxRefAgeMs(branch, branchAgeMs) + .setMinSnapshotsToKeep(branch, 1) + .commit(); + table.newDelete().deleteFile(FILE_A).commit(); + Snapshot latestTestSnapshot = table.snapshot(branch); + long currentTime = System.currentTimeMillis(); + waitUntilAfter(currentTime + branchAgeMs); + + Set<String> expectedDeletedFiles = + ImmutableSet.of( + latestTestSnapshot.manifestListLocation(), + Iterables.getOnlyElement(latestTestSnapshot.allManifests(table.io())).path(), + FILE_A.location()); Review Comment: I talked this through with @rdblue earlier today, the TLDR is that incremental cleanup actually cannot safely detect that FILE_A really can be cleaned up until the snapshot which marked it as deleted is itself removed (which is not the case). So I think we can treat this as expected behavior. I've updated this test to add another commit on main and then expiring the snapshot which would add and remove FILE_A. This verifies that FILE_A can safely be cleaned up incrementally even though there used to be a branch before because that branch was entirely on main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org