rdblue opened a new pull request #3775:
URL: https://github.com/apache/iceberg/pull/3775


   The util class `SnapshotUtil` is shared, so it needs to have fairly strict 
method contracts. This replaces `SnapshotUtil.firstSnapshotAfterTimestamp` with 
`oldestAncestorAfter`:
   * The new implementation throws `IllegalStateException` if the correct 
ancestor cannot be determined
   * The new name is clear that the snapshots considered are ancestors, not all 
snapshots
   
   This also updates places that called `firstSnapshotAfterTimestamp` and 
attempts to have the same behavior by catching the `IllegalStateException` 
(cannot determine ancestor) and uses the oldest ancestor instead. However, in 
updating the Spark streaming code, I noticed a few bugs:
   * `planFiles` will call `.snapshotId()` without checking the snapshot, which 
can be null if the timestamp is newer than the current, resulting in a 
`NullPointerException`
   * The initial offset store checks for a future timestamp, but `planFiles` 
does not
   * The initial offset store handles null `fromTimestamp`, but `planFiles` 
does not
   * The `initialFutureStartOffset` creates an offset after the current 
snapshot, but not necessarily after the given future time
   
   I'm also attempting to fix those issues. This simplifies the code by 
removing several static helper methods. These assisted readability, but made 
assumptions about whether the table has a current snapshot and so were prone to 
`NullPointerExceptions`. Instead, this adds `determineStartingOffset` that is 
responsible for getting a starting offset or returning 
`StreamingOffset.START_OFFSET` if it cannot be determined because of the table 
state or a timestamp in the future.
   
   Now, `StreamingOffset.START_OFFSET` means that the job cannot start because 
the start offset has not been determined, and there is no "initial future 
offset".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to