zhengzengprc commented on a change in pull request #3039:
URL: https://github.com/apache/iceberg/pull/3039#discussion_r700639354



##########
File path: core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java
##########
@@ -65,12 +65,16 @@ public static boolean ancestorOf(Table table, long 
snapshotId, long ancestorSnap
   }
 
   /**
-   * Traverses the history of the table's current snapshot and finds the 
oldest Snapshot.
-   * @return null if there is no current snapshot in the table, else the 
oldest Snapshot.
+   * Traverses the history of the table's current snapshot and finds the 
oldest Snapshot after the timestamp.
+   * @return null if there is no current snapshot in the table, else the 
oldest Snapshot after the timestamp.

Review comment:
       Hello @kbendick , thanks for bringing up the questions. That's a really 
good question for the strictly greater than the passed in timestamp.
   Honestly, I don't have a strong reason use strictly greater. IMO, the 
function should include the "equal" case while user passes in the existing 
timestamp that exists.
   From the sample:
   let's say we have snapshot with time series: 10,20,30,40,50
   Current snapshot will be with timestamp 50.
   The input timestamp is 30.
   I think 30 should be the expected answer from the function calls.
   The boolean logic should include the "=" case,
   `while (current.parentId() != null && 
table.snapshot(current.parentId()).timestampMillis() >= timestamp)`
   
   I try to reach @SreeramGarlapati to see if we have a strong reason put 
"strictly greater", if I miss something here. But I am still waiting input from 
Sreeram.
   @SreeramGarlapati do you know anything behind the scene that we put the 
"strictly greater"?
   Personally I prefer the ">=" approach.
   Beside the reason of `as-of-timestamp`. I think the function return should 
be consistent, let's take back above example to go through current 
implementation:
   Case 1:
   Snapshot with time series: 10,20,30,40,50
   Current timestamp: 50
   Input timestamp: 50
   Function return 50
   
   Case 2:
   Snapshot with time series: 10,20,30,40,50
   Current timestamp: 50
   Input timestamp: 40
   Function return 50
   
   The result looks like inconsistent to me, since case 1 returns inclusive 50, 
but case 2 not. IMO, these logic should be consistent with each other




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to