rdblue commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r863337777


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -195,6 +299,18 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      TableMetadata updated = ops.refresh();
+      if (updated.refs() != null) {
+        List<SnapshotRef> branches = updated.refs()
+            .values().stream().filter(SnapshotRef::isBranch)
+            .collect(Collectors.toList());
+
+        if (branches.size() > 1) {
+          throw new UnsupportedOperationException(
+              "Deleting expired files when there is more than 1 branch is 
currently not supported");
+        }

Review Comment:
   The problem is that we have several cases that can't be reliably cleaned up 
without using the reference set. For example:
   
   ```
     A - B - C
     ` - D
   ```
   
   If A is aged off, there is no way to know that B, C, and D share all of the 
files from A. So expiring B could delete files that are referenced by D but not 
C.
   
   To take care of cases like this, we need to do reference set comparison. But 
rather than implementing that all at once here, we decided to fail snapshot 
expiration in the short term.
   
   However, one thing does need to be fixed: this check should happen _before_ 
committing the changes. If we detect that we can't reliably clean up the table, 
we shouldn't drop the snapshots anyway. So this check should come first in the 
commit. (@amogh-jahagirdar)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to