rdblue commented on code in PR #13614:
URL: https://github.com/apache/iceberg/pull/13614#discussion_r2220580612
##########
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java:
##########
@@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy {
@Override
@SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"})
public void cleanFiles(TableMetadata beforeExpiration, TableMetadata
afterExpiration) {
- if (afterExpiration.refs().size() > 1) {
+ if (beforeExpiration.refs().size() > 1 || afterExpiration.refs().size() >
1) {
Review Comment:
Could this leak files if there is a rollback or a staged (but not
referenced) commit?
For instance, if you have snapshots like this:
```
A --- B --- C --- D (main)
`--- X (unreferenced)
```
If snapshot X should be cleaned up, Iceberg should not use the incremental
strategy. Also, if X is not removed but B and C are, then incremental also
can't be used because C may have removed files from X.
Instead of checking the number of refs, I think we need to check the history
of the main branch. If there is any snapshot outside of the main branch before
or after expiration, incremental cannot be used. There are cases where a ref is
in the main branch's history that are safe, but I would probably keep the
implementation simpler and avoid checking for things like branch overlap.
What I would do is:
1. If there is more than one ref after expiration, use the reference set
strategy (optionally, we could check that all refs are in the main branch's
history and allow it)
2. If there are any snapshots to be removed after expiration that were not
the main branch's ancestors, use the reference set strategy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]