rdblue commented on a change in pull request #3775:
URL: https://github.com/apache/iceberg/pull/3775#discussion_r772589004



##########
File path: core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java
##########
@@ -102,43 +102,35 @@ public static Snapshot oldestAncestor(Table table) {
   }
 
   /**
-   * Traverses the history of the table's current snapshot and:
-   * 1. returns null, if no snapshot exists or target timestamp is more recent 
than the current snapshot.
-   * 2. else return the first snapshot which satisfies {@literal >=} 
targetTimestamp.
-   * <p>
-   * Given the snapshots (with timestamp): [S1 (10), S2 (11), S3 (12), S4 (14)]
-   * <p>
-   * firstSnapshotAfterTimestamp(table, x {@literal <=} 10) = S1
-   * firstSnapshotAfterTimestamp(table, 11) = S2
-   * firstSnapshotAfterTimestamp(table, 13) = S4
-   * firstSnapshotAfterTimestamp(table, 14) = S4
-   * firstSnapshotAfterTimestamp(table, x {@literal >} 14) = null
-   * <p>
-   * where x is the target timestamp in milliseconds and Si is the snapshot
+   * Traverses the history of the table's current snapshot and finds the first 
snapshot after the given timestamp.
    *
    * @param table a table
-   * @param targetTimestampMillis a timestamp in milliseconds
-   * @return the first snapshot which satisfies {@literal >=} targetTimestamp, 
or null if the current snapshot is
-   * more recent than the target timestamp
+   * @param timestampMillis a timestamp in milliseconds
+   * @return the first snapshot after the given timestamp, or null if the 
current snapshot is older than the timestamp
+   * @throws IllegalStateException if the first ancestor after the given time 
can't be determined
    */
-  public static Snapshot firstSnapshotAfterTimestamp(Table table, Long 
targetTimestampMillis) {
-    Snapshot currentSnapshot = table.currentSnapshot();
-    // Return null if no snapshot exists or target timestamp is more recent 
than the current snapshot
-    if (currentSnapshot == null || currentSnapshot.timestampMillis() < 
targetTimestampMillis) {
+  public static Snapshot oldestAncestorAfter(Table table, long 
timestampMillis) {
+    if (table.currentSnapshot() == null) {
+      // there are no snapshots or ancestors
       return null;
     }
 
-    // Return the oldest snapshot which satisfies >= targetTimestamp
     Snapshot lastSnapshot = null;
     for (Snapshot snapshot : currentAncestors(table)) {
-      if (snapshot.timestampMillis() < targetTimestampMillis) {
+      if (snapshot.timestampMillis() <= timestampMillis) {
         return lastSnapshot;
       }
+
       lastSnapshot = snapshot;
     }
 
-    // Return the oldest snapshot if the target timestamp is less than the 
oldest snapshot of the table
-    return lastSnapshot;
+    if (lastSnapshot != null && lastSnapshot.parentId() == null) {
+      // this is the first snapshot in the table, return it

Review comment:
       I see your point here, but the result is based on the table state that 
gets passed in. If the table state is missing information, then we can't make 
it consistent.
   
   Here's another way to think about it:
   
   ```
   t1 = commitSnapshotOne()
   t2 = commitSnapshotTwo()
   oldestAncestorAfter(table, Long.MinValue) // returns snapshot one
   expireSnapshots(t2 - 1)
   oldestAncestorAfter(table, Long.MinValue) // returns snapshot two
   ```
   
   I think that the behavior above is worse than throwing an exception based on 
the table state because it is silently inconsisent. At least throwing an 
exception tells you why it isn't returning the expected value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to