rdblue commented on a change in pull request #497: Support retaining last N
snapshots
URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r331590649
##########
File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
##########
@@ -77,8 +79,34 @@ public ExpireSnapshots expireSnapshotId(long
expireSnapshotId) {
@Override
public ExpireSnapshots expireOlderThan(long timestampMillis) {
- LOG.info("Expiring snapshots older than: {} ({})", new
Date(timestampMillis), timestampMillis);
- this.expireOlderThan = timestampMillis;
+ updateExpireOlderThan(timestampMillis, true);
+ return this;
+ }
+
+ private boolean updateExpireOlderThan(long timestampMillis, boolean
fromExpireOlderThan) {
+ if (expireOlderThan == null ||
+ expireOlderThan > timestampMillis ||
+ this.isExpireOlderThanSet == fromExpireOlderThan) {
+ LOG.info("Expiring snapshots older than: {} ({})", new
Date(timestampMillis), timestampMillis);
+ this.expireOlderThan = timestampMillis;
+ this.isExpireOlderThanSet = fromExpireOlderThan;
+ return true;
+ }
+ return false;
+ }
+
+ @Override
+ public ExpireSnapshots retainLast(int numSnapshots) {
Review comment:
I think the current implementation is confusing. The nth snapshot is found
by traversing parents of the current snapshot, then any newer snapshots are
kept. If I have a recently staged snapshot and run `retainLast(3)`, then the
number of retained snapshots will be 4. The description does not document this
behavior.
This also requires coordinating `expireOlderThan` and `retainLast`, and that
makes this more complicated. I think that these two should be independent.
Behavior should be:
* All snapshots older than the `expireOlderThan` time are removed, _unless_
. . .
* A snapshot is one of the most recent `numSnapshots` from `retainLast`
I think this method should build a set of snapshot IDs that are to be
retained that are the most recent, ignoring parentIds.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]