amogh-jahagirdar commented on code in PR #5666:
URL: https://github.com/apache/iceberg/pull/5666#discussion_r957643199


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -366,11 +367,19 @@ private void removeExpiredFiles(
     // Reads and deletes are done using 
Tasks.foreach(...).suppressFailureWhenFinished to complete
     // as much of the delete work as possible and avoid orphaned data or 
manifest files.
 
-    // this is the set of ancestors of the current table state. when removing 
snapshots, this must
-    // only remove files that were deleted in an ancestor of the current table 
state to avoid
+    // ToDo: This will be removed when reachability analysis is done so files 
across multiple
+    // branches can be removed
+    SnapshotRef branchToCleanup = Iterables.getFirst(base.refs().values(), 
null);

Review Comment:
   My thinking is the following:
   
   1.) Logically, a tagged snapshot would either need to exist on either a.) 
non-main branch b.) main-branch
   2.) If the tag exists on main a file cleanup couldn't be done in the first 
place (because main cannot age off so we'd have multiple refs), so this point 
wouldn't have been reached
   3.) If the tag exists on a non-main branch and the non-main branch ages off 
before the tagged snapshot which gets retained, then the tag ends up being 
de-facto "tip" of a lineage. In which case, the expiration logic would work as 
expected. If non-main branch still is retained, then we wouldn't reach this 
point (same case as 2, just that the other ref is the non-main branch). 
   
   Combining this with the fact that writes cannot be performed on tags leads 
me to believe that for purpose of expiration there's no need to differentiate 
tags and branches. 
   
   I could call this refToCleanup if that makes more sense to folks? But the 
only case where this is a tag is the case what I mentioned in 3.) in which case 
it's just a "dangling" snapshot which is referenced by a tag. @namrathamyske 
@rdblue @jackye1995 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to