amogh-jahagirdar commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r864255537


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -195,6 +299,18 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      TableMetadata updated = ops.refresh();
+      if (updated.refs() != null) {
+        List<SnapshotRef> branches = updated.refs()
+            .values().stream().filter(SnapshotRef::isBranch)
+            .collect(Collectors.toList());
+
+        if (branches.size() > 1) {
+          throw new UnsupportedOperationException(
+              "Deleting expired files when there is more than 1 branch is 
currently not supported");
+        }

Review Comment:
    I re-read and thought about @rdblue example. The problem is I was assuming 
there would be another branch and so I was thinking we would always be able to 
tell when we can't delete the files just based off the number of refs but that 
we could remove the snapshots themselves.
   
   My assumption there isn't valid; the 2nd branch's ref could be aged off 
independently and there would then be an isolated snapshot lineage. Then later 
on when the procedure is run  we have 
   
   ` B - C
      D
   `
   
   There's one ref (let's say the B-C lineage is main). When we go to expire 
there still could be files that B and C does not refer to but D does. There is 
no longer a 2nd branch, so the removal of files would go through and delete 
files which D has. 
   
   So then yes, we must prevent the actual snapshot removal operation from 
going through in case of multiple refs.



##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -195,6 +299,18 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      TableMetadata updated = ops.refresh();
+      if (updated.refs() != null) {
+        List<SnapshotRef> branches = updated.refs()
+            .values().stream().filter(SnapshotRef::isBranch)
+            .collect(Collectors.toList());
+
+        if (branches.size() > 1) {
+          throw new UnsupportedOperationException(
+              "Deleting expired files when there is more than 1 branch is 
currently not supported");
+        }

Review Comment:
    I re-read and thought about @rdblue example. The problem is I was assuming 
there would be another branch and so I was thinking we would always be able to 
tell when we can't delete the files just based off the number of refs but that 
we could remove the snapshots themselves.
   
   My assumption there isn't valid; the 2nd branch's ref could be aged off 
independently and there would then be an isolated snapshot lineage. Then later 
on when the procedure is run  we have 
   
   ` B - C
   
      D
   `
   
   There's one ref (let's say the B-C lineage is main). When we go to expire 
there still could be files that B and C does not refer to but D does. There is 
no longer a 2nd branch, so the removal of files would go through and delete 
files which D has. 
   
   So then yes, we must prevent the actual snapshot removal operation from 
going through in case of multiple refs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to