amogh-jahagirdar commented on code in PR #4578:
URL: https://github.com/apache/iceberg/pull/4578#discussion_r864396842
##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -161,21 +173,94 @@ public List<Snapshot> apply() {
private TableMetadata internalApply() {
this.base = ops.refresh();
+ if (cleanExpiredFiles && base.refs().size() > 1) {
+ throw new UnsupportedOperationException(
+ "Cannot perform snapshot expiration and have clean files set when
there is more than 1 ref");
+ }
Review Comment:
My doubt here is that say we have the following commit graph
A - B - C <- main
` - D <- test
If table.expireSnapshots().cleanExpiredFiles(false).commit() and A expires.
Let's also say the test branch reference also expires. Then we have the
following disconnected graph.
B - C <- main
D
If a user now does
table.expireSnapshots().cleanExpiredFiles(**true**).commit(), and say B should
be expired. There's no reference to D but because we are decoupling ref aging
from the snapshot aging the snapshot could still be retained. we would allow
the removal of files (only main ref exists at this point). This could lead to
deletion of files which are still referred to by D (if those files are not in C
they are in B). So it's an unsafe deletion.
So I think the correct check here isn't just if base.refs().size() > 1, I
think the correct check would be to also determine if there are any other
dangling snapshots. That would cover the disconnected graph case where there is
a missing reference to one previous lineage I mentioned. We could do that by
checking if there are any snapshots outside of the main lineage (only need to
do this check if main is the only reference).
@rdblue let me know if I'm missing something here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]