amogh-jahagirdar commented on code in PR #5666:
URL: https://github.com/apache/iceberg/pull/5666#discussion_r957643199
##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -366,11 +367,19 @@ private void removeExpiredFiles(
// Reads and deletes are done using
Tasks.foreach(...).suppressFailureWhenFinished to complete
// as much of the delete work as possible and avoid orphaned data or
manifest files.
- // this is the set of ancestors of the current table state. when removing
snapshots, this must
- // only remove files that were deleted in an ancestor of the current table
state to avoid
+ // ToDo: This will be removed when reachability analysis is done so files
across multiple
+ // branches can be removed
+ SnapshotRef branchToCleanup = Iterables.getFirst(base.refs().values(),
null);
Review Comment:
My thinking is the following:
1.) Logically, a tagged snapshot would either need to exist on either a.)
non-main branch b.) main-branch
2.) If the tag exists on main a file cleanup couldn't be done in the first
place (because main cannot age off so we'd have multiple refs), so this point
wouldn't have been reached
3.) If the tag exists on a non-main branch and the non-main branch ages off
before the tagged snapshot which gets retained, then the tag ends up being
de-facto "tip" of a lineage. In which case, the expiration logic would work as
expected. If non-main branch still is retained, then we wouldn't reach this
point (same case as 2, just that the other ref is the non-main branch).
Combining this with the fact that writes cannot be performed on tags leads
me to believe that for purpose of expiration there's no need to differentiate
tags and branches.
I could call this refToCleanup if that makes more sense to folks? But the
only case where this is a tag is the case what I mentioned in 3.) in which case
it's just a "dangling" snapshot which is referenced by a tag. @namrathamyske
@rdblue @jackye1995
Also let me know if there's a flaw in my logic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]