rdblue commented on a change in pull request #388: Handle rollback in snapshot 
expiration
URL: https://github.com/apache/incubator-iceberg/pull/388#discussion_r314106205
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
 ##########
 @@ -120,56 +123,128 @@ public void commit() {
           }
         });
 
-    LOG.info("Committed snapshot changes; cleaning up expired manifests and 
data files.");
+    cleanExpiredSnapshots();
+  }
 
+  private void cleanExpiredSnapshots() {
     // clean up the expired snapshots:
     // 1. Get a list of the snapshots that were removed
     // 2. Delete any data files that were deleted by those snapshots and are 
not in the table
     // 3. Delete any manifests that are no longer used by current snapshots
     // 4. Delete the manifest lists
 
+    TableMetadata current = ops.refresh();
+
+    Set<Long> validIds = Sets.newHashSet();
+    for (Snapshot snapshot : current.snapshots()) {
+      validIds.add(snapshot.snapshotId());
+    }
+
+    Set<Long> expiredIds = Sets.newHashSet();
+    for (Snapshot snapshot : base.snapshots()) {
+      long snapshotId = snapshot.snapshotId();
+      if (!validIds.contains(snapshotId)) {
+        // the snapshot was expired
+        LOG.info("Expired snapshot: {}", snapshot);
+        expiredIds.add(snapshotId);
+      }
+    }
+
+    if (expiredIds.isEmpty()) {
+      // if no snapshots were expired, skip cleanup
+      return;
+    }
+
+    LOG.info("Committed snapshot changes; cleaning up expired manifests and 
data files.");
+
+    cleanExpiredFiles(current.snapshots(), validIds, expiredIds);
+  }
+
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
 
 Review comment:
   I tried to break this up as much as possible, but I don't think it is worth 
the changes to break up this method. The loops produce too many intermediate 
sets that would need to be returned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to