rdblue commented on a change in pull request #3775:
URL: https://github.com/apache/iceberg/pull/3775#discussion_r772589004
##########
File path: core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java
##########
@@ -102,43 +102,35 @@ public static Snapshot oldestAncestor(Table table) {
}
/**
- * Traverses the history of the table's current snapshot and:
- * 1. returns null, if no snapshot exists or target timestamp is more recent
than the current snapshot.
- * 2. else return the first snapshot which satisfies {@literal >=}
targetTimestamp.
- * <p>
- * Given the snapshots (with timestamp): [S1 (10), S2 (11), S3 (12), S4 (14)]
- * <p>
- * firstSnapshotAfterTimestamp(table, x {@literal <=} 10) = S1
- * firstSnapshotAfterTimestamp(table, 11) = S2
- * firstSnapshotAfterTimestamp(table, 13) = S4
- * firstSnapshotAfterTimestamp(table, 14) = S4
- * firstSnapshotAfterTimestamp(table, x {@literal >} 14) = null
- * <p>
- * where x is the target timestamp in milliseconds and Si is the snapshot
+ * Traverses the history of the table's current snapshot and finds the first
snapshot after the given timestamp.
*
* @param table a table
- * @param targetTimestampMillis a timestamp in milliseconds
- * @return the first snapshot which satisfies {@literal >=} targetTimestamp,
or null if the current snapshot is
- * more recent than the target timestamp
+ * @param timestampMillis a timestamp in milliseconds
+ * @return the first snapshot after the given timestamp, or null if the
current snapshot is older than the timestamp
+ * @throws IllegalStateException if the first ancestor after the given time
can't be determined
*/
- public static Snapshot firstSnapshotAfterTimestamp(Table table, Long
targetTimestampMillis) {
- Snapshot currentSnapshot = table.currentSnapshot();
- // Return null if no snapshot exists or target timestamp is more recent
than the current snapshot
- if (currentSnapshot == null || currentSnapshot.timestampMillis() <
targetTimestampMillis) {
+ public static Snapshot oldestAncestorAfter(Table table, long
timestampMillis) {
+ if (table.currentSnapshot() == null) {
+ // there are no snapshots or ancestors
return null;
}
- // Return the oldest snapshot which satisfies >= targetTimestamp
Snapshot lastSnapshot = null;
for (Snapshot snapshot : currentAncestors(table)) {
- if (snapshot.timestampMillis() < targetTimestampMillis) {
+ if (snapshot.timestampMillis() <= timestampMillis) {
return lastSnapshot;
}
+
lastSnapshot = snapshot;
}
- // Return the oldest snapshot if the target timestamp is less than the
oldest snapshot of the table
- return lastSnapshot;
+ if (lastSnapshot != null && lastSnapshot.parentId() == null) {
+ // this is the first snapshot in the table, return it
Review comment:
I see your point here, but the result is based on the table state that
gets passed in. If the table state is missing information, then we can't make
it consistent.
Here's another way to think about it:
```
t1 = commitSnapshotOne()
t2 = commitSnapshotTwo()
oldestAncestorAfter(table, Long.MinValue) // returns snapshot one
expireSnapshots(t2 - 1)
oldestAncestorAfter(table, Long.MinValue) // returns snapshot two
```
I think that the behavior above is worse than throwing an exception based on
the table state because it is silently inconsisent. At least throwing an
exception tells you why it isn't returning the expected value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]