rdblue commented on a change in pull request #802: Update RemoveSnapshots to
protect cherry-picked data files
URL: https://github.com/apache/incubator-iceberg/pull/802#discussion_r381444502
##########
File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
##########
@@ -188,16 +189,33 @@ private void cleanExpiredFiles(List<Snapshot> snapshots,
Set<Long> validIds, Set
// physically deleting files that were logically deleted in a commit that
was rolled back.
Set<Long> ancestorIds =
Sets.newHashSet(SnapshotUtil.ancestorIds(base.currentSnapshot(),
base::snapshot));
Review comment:
The current set of changes makes some assumptions that generally hold:
* Cherry-picked snapshots are newer than the original snapshot -- C comes
after B in time
* Expiration happens in order -- B is expired before C or in the same commit
I'm not yet sure how to handle more complex cases, like expiring a specific
snapshot. I think maybe we should just add a warning or deprecate that method.
I don't know of any current uses.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]