amogh-jahagirdar commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r864396842


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -161,21 +173,94 @@ public List<Snapshot> apply() {
 
   private TableMetadata internalApply() {
     this.base = ops.refresh();
+    if (cleanExpiredFiles && base.refs().size() > 1) {
+      throw new UnsupportedOperationException(
+          "Cannot perform snapshot expiration and have clean files set when 
there is more than 1 ref");
+    }

Review Comment:
   My doubt here is that say we have the following commit graph
   
   A - B - C 
    ` - D 
    
    A - B - C is main, A - D is test
    
   If table.expireSnapshots().cleanExpiredFiles(false).commit() and A expires. 
Let's also say the test branch reference also expires. Then we have the 
following disconnected graph.
   
   B - C <main>
   D 
   
   If a user now does 
table.expireSnapshots().cleanExpiredFiles(**true**).commit(), and say B should 
be expired. There's no reference to D but because we are decoupling ref aging 
from the snapshot aging it could still be retained. we would allow the removal 
of files (only main ref exists at this point). This could lead to deletion of 
files which are still referred to by D (if those files are not in C they are in 
B). So it's an unsafe deletion.
   
   @rdblue let me know if I'm missing something here 
   
    
    
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to