amogh-jahagirdar commented on code in PR #5364:
URL: https://github.com/apache/iceberg/pull/5364#discussion_r930621186


##########
core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java:
##########
@@ -322,6 +323,39 @@ public static long snapshotIdAsOfTime(Table table, long 
timestampMillis) {
     return snapshotId;
   }
 
+  /**
+   * Returns the ID of the most recent snapshot on the given branch
+   * as of the given time in milliseconds
+   *
+   * @param table a {@link Table}
+   * @param branch a {@link String}
+   * @param timestampMillis the timestamp in millis since the Unix epoch
+   * @return the snapshot ID
+   * @throws IllegalArgumentException when no snapshot is found in the table, 
on the given branch
+   * older than the timestamp
+   */
+  public static long snapshotIdAsOfTime(Table table, String branch, long 
timestampMillis) {
+    SnapshotRef ref = table.refs().get(branch);
+    Preconditions.checkArgument(ref != null, "Branch %s does not exist", 
branch);
+    Preconditions.checkArgument(ref.isBranch(), "Ref %s is a tag, not a 
branch", branch);
+    Long snapshotId = null;
+    long minimumTimeDifference = Long.MAX_VALUE;
+    for (Snapshot snapshot : ancestorsOf(ref.snapshotId(), table::snapshot)) {
+      if (snapshot.timestampMillis() <= timestampMillis) {
+        if (timestampMillis - snapshot.timestampMillis() <= 
minimumTimeDifference) {
+          minimumTimeDifference = timestampMillis - snapshot.timestampMillis();
+          snapshotId = snapshot.snapshotId();
+        }
+      }

Review Comment:
   This does not seem right in the presence of staged commits/WAP. If we have a 
staged commit on a branch then we should not include that in time travel; but 
with traversing through ancestorsOf we would get that. I think part of the 
reason why time travel relies on the snapshot log and not just traversing 
through table snapshot ancestors is because the snapshot log is the source of 
truth for how the table (main) state evolved over time. Looking through 
ancestors and comparing timestamps would count staged commits which we don't 
want (we should only count snapshots which are part of that branch's state).
   
   
   This may involve having to maintain some kind of extra metadata like instead 
of just a single snapshot log for the main table state, there is a map<string, 
list of history entries> where the key is the ref and the value is the list of 
logs. When a snapshot is produced on a branch, and is not staged then we would 
update this metadata.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to