rdblue commented on code in PR #5669:
URL: https://github.com/apache/iceberg/pull/5669#discussion_r985315437
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -602,8 +623,9 @@ public void testExpireOlderThanMultipleCalls() {
}
// Retain last 2 snapshots and expire older than t3
- table
- .expireSnapshots()
+ RemoveSnapshots removeSnapshots = (RemoveSnapshots)
table.expireSnapshots();
+ removeSnapshots
+ .withIncrementalCleanup(incrementalCleanup)
Review Comment:
Why doesn't this use `removeSnapshots(Table)`?
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -555,8 +575,9 @@ public void testRetainLastKeepsExpiringSnapshot() {
}
// Retain last 2 snapshots and expire older than t3
- table
- .expireSnapshots()
+ RemoveSnapshots removeSnapshots = (RemoveSnapshots)
table.expireSnapshots();
+ removeSnapshots
+ .withIncrementalCleanup(incrementalCleanup)
Review Comment:
Why doesn't this use `removeSnapshots(Table)`?
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -504,7 +519,12 @@ public void testRetainNLargerThanCurrentSnapshots() {
// Retain last 4 snapshots
Transaction tx = table.newTransaction();
- tx.expireSnapshots().expireOlderThan(t3).retainLast(4).commit();
+ RemoveSnapshots removeSnapshots = (RemoveSnapshots) tx.expireSnapshots();
Review Comment:
You should be able to use `removeSnapshots(tx.table())`.
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -504,7 +519,12 @@ public void testRetainNLargerThanCurrentSnapshots() {
// Retain last 4 snapshots
Transaction tx = table.newTransaction();
- tx.expireSnapshots().expireOlderThan(t3).retainLast(4).commit();
+ RemoveSnapshots removeSnapshots = (RemoveSnapshots) tx.expireSnapshots();
Review Comment:
You should be able to use `removeSnapshots(tx.table())` for transaction
tests. You could also add another helper, but I think just passing in the
transaction table should work fine.
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -114,7 +122,9 @@ public void testExpireOlderThanWithDelete() {
Set<String> deletedFiles = Sets.newHashSet();
-
table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+ RemoveSnapshots removeSnapshots = (RemoveSnapshots)
table.expireSnapshots();
Review Comment:
Looks like this line was added by mistake?
##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -39,13 +39,21 @@
@RunWith(Parameterized.class)
public class TestRemoveSnapshots extends TableTestBase {
- @Parameterized.Parameters(name = "formatVersion = {0}")
+ private final boolean incrementalCleanup;
+
+ @Parameterized.Parameters(name = "formatVersion = {0}, incrementalCleanup =
{1}")
Review Comment:
Overall, the tests look correct to me. Just a couple minor things but no
blockers.
##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -326,319 +315,25 @@ public void commit() {
}
}
- private void cleanExpiredSnapshots() {
- // clean up the expired snapshots:
- // 1. Get a list of the snapshots that were removed
- // 2. Delete any data files that were deleted by those snapshots and are
not in the table
- // 3. Delete any manifests that are no longer used by current snapshots
- // 4. Delete the manifest lists
+ ExpireSnapshots withIncrementalCleanup(boolean useIncrementalCleanup) {
+ this.incrementalCleanup = useIncrementalCleanup;
+ return this;
+ }
+ private void cleanExpiredSnapshots() {
TableMetadata current = ops.refresh();
- Set<Long> validIds = Sets.newHashSet();
- for (Snapshot snapshot : current.snapshots()) {
- validIds.add(snapshot.snapshotId());
- }
-
- Set<Long> expiredIds = Sets.newHashSet();
- for (Snapshot snapshot : base.snapshots()) {
- long snapshotId = snapshot.snapshotId();
- if (!validIds.contains(snapshotId)) {
- // the snapshot was expired
- LOG.info("Expired snapshot: {}", snapshot);
- expiredIds.add(snapshotId);
- }
- }
-
- if (expiredIds.isEmpty()) {
- // if no snapshots were expired, skip cleanup
- return;
- }
-
- LOG.info("Committed snapshot changes; cleaning up expired manifests and
data files.");
-
- removeExpiredFiles(current.snapshots(), validIds, expiredIds);
- }
-
- @SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"})
- private void removeExpiredFiles(
- List<Snapshot> snapshots, Set<Long> validIds, Set<Long> expiredIds) {
- // Reads and deletes are done using
Tasks.foreach(...).suppressFailureWhenFinished to complete
- // as much of the delete work as possible and avoid orphaned data or
manifest files.
-
- // this is the set of ancestors of the current table state. when removing
snapshots, this must
- // only remove files that were deleted in an ancestor of the current table
state to avoid
- // physically deleting files that were logically deleted in a commit that
was rolled back.
- Set<Long> ancestorIds =
- Sets.newHashSet(SnapshotUtil.ancestorIds(base.currentSnapshot(),
base::snapshot));
-
- Set<Long> pickedAncestorSnapshotIds = Sets.newHashSet();
- for (long snapshotId : ancestorIds) {
- String sourceSnapshotId =
-
base.snapshot(snapshotId).summary().get(SnapshotSummary.SOURCE_SNAPSHOT_ID_PROP);
- if (sourceSnapshotId != null) {
- // protect any snapshot that was cherry-picked into the current table
state
- pickedAncestorSnapshotIds.add(Long.parseLong(sourceSnapshotId));
- }
+ if (incrementalCleanup == null) {
+ incrementalCleanup = current.refs().size() == 1;
Review Comment:
Nice to see that this uses incremental if there's only one ref and
incremental isn't specifically set. Good idea.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]