amogh-jahagirdar commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r864396842


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -161,21 +173,94 @@ public List<Snapshot> apply() {
 
   private TableMetadata internalApply() {
     this.base = ops.refresh();
+    if (cleanExpiredFiles && base.refs().size() > 1) {
+      throw new UnsupportedOperationException(
+          "Cannot perform snapshot expiration and have clean files set when 
there is more than 1 ref");
+    }

Review Comment:
   My doubt here is that say we have the following commit graph
   
   A - B - C  <- main
    ` - D  <- test
     
   If table.expireSnapshots().cleanExpiredFiles(false).commit() and A expires. 
Let's also say the test branch reference also expires. Then we have the 
following disconnected graph.
   
   B - C <- main
   D 
   
   If a user now does 
table.expireSnapshots().cleanExpiredFiles(**true**).commit(), and say B should 
be expired. There's no reference to D but because we are decoupling ref aging 
from the snapshot aging the snapshot could still be retained. we would allow 
the removal of files (only main ref exists at this point). This could lead to 
deletion of files which are still referred to by D (if those files are not in C 
they are in B). So it's an unsafe deletion.
   
   So I think the correct check here isn't just if base.refs().size() > 1, I 
think the correct check would be to determine if there are any other dangling 
snapshots. That would cover the disconnected graph case I mentioned.
   
   @rdblue let me know if I'm missing something here 
   
    
    
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to