amogh-jahagirdar commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r864255537
##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -195,6 +299,18 @@ public void commit() {
LOG.info("Committed snapshot changes");
if (cleanExpiredFiles) {
+ TableMetadata updated = ops.refresh();
+ if (updated.refs() != null) {
+ List<SnapshotRef> branches = updated.refs()
+ .values().stream().filter(SnapshotRef::isBranch)
+ .collect(Collectors.toList());
+
+ if (branches.size() > 1) {
+ throw new UnsupportedOperationException(
+ "Deleting expired files when there is more than 1 branch is
currently not supported");
+ }
Review Comment:
I re-read and thought about @rdblue example. The problem is I was assuming
there would be another branch and so I was thinking we would always be able to
tell when we can't delete the files just based off the number of refs but that
we could remove the snapshots themselves.
My assumption there isn't valid; the 2nd branch's ref could be aged off
independently and there would then be an isolated snapshot lineage. Let's say A
gets aged out. Then later on when the procedure is run a 2nd time we have
` B - C
D`
There's one ref (let's say the B-C lineage is main). When we go to expire
there still could be files that B and C does not refer to but D does. There is
no longer a 2nd branch, so the removal of files would go through and delete
files which D has whic his not what we want.
So then yes, we must prevent the actual snapshot removal operation from
going through in case of multiple refs and in case the user wants to clean the
data. Also from an API contract perspective for ExpireSnapshots this seems
cleaner as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]