rdblue commented on a change in pull request #497: Support retaining last N 
snapshots
URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r331590649
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
 ##########
 @@ -77,8 +79,34 @@ public ExpireSnapshots expireSnapshotId(long 
expireSnapshotId) {
 
   @Override
   public ExpireSnapshots expireOlderThan(long timestampMillis) {
-    LOG.info("Expiring snapshots older than: {} ({})", new 
Date(timestampMillis), timestampMillis);
-    this.expireOlderThan = timestampMillis;
+    updateExpireOlderThan(timestampMillis, true);
+    return this;
+  }
+
+  private boolean updateExpireOlderThan(long timestampMillis, boolean 
fromExpireOlderThan) {
+    if (expireOlderThan == null ||
+        expireOlderThan > timestampMillis ||
+        this.isExpireOlderThanSet == fromExpireOlderThan) {
+      LOG.info("Expiring snapshots older than: {} ({})", new 
Date(timestampMillis), timestampMillis);
+      this.expireOlderThan = timestampMillis;
+      this.isExpireOlderThanSet = fromExpireOlderThan;
+      return true;
+    }
+    return false;
+  }
+
+  @Override
+  public ExpireSnapshots retainLast(int numSnapshots) {
 
 Review comment:
   I think the current implementation is confusing. The nth snapshot is found 
by traversing parents of the current snapshot, then any newer snapshots are 
kept. If I have a recently staged snapshot and run `retainLast(3)`, then the 
number of retained snapshots will be 4. The description does not document this 
behavior.
   
   This also requires coordinating `expireOlderThan` and `retainLast`, and that 
makes this more complicated. I think that these two should be independent. 
Behavior should be:
   * All snapshots older than the `expireOlderThan` time are removed, _unless_ 
. . .
   * A snapshot is one of the most recent `numSnapshots` from `retainLast`
   
   I think this method should build a set of snapshot IDs that are to be 
retained that are the most recent, ignoring parentIds.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to