rdblue opened a new pull request #3775: URL: https://github.com/apache/iceberg/pull/3775
The util class `SnapshotUtil` is shared, so it needs to have fairly strict method contracts. This replaces `SnapshotUtil.firstSnapshotAfterTimestamp` with `oldestAncestorAfter`: * The new implementation throws `IllegalStateException` if the correct ancestor cannot be determined * The new name is clear that the snapshots considered are ancestors, not all snapshots This also updates places that called `firstSnapshotAfterTimestamp` and attempts to have the same behavior by catching the `IllegalStateException` (cannot determine ancestor) and uses the oldest ancestor instead. However, in updating the Spark streaming code, I noticed a few bugs: * `planFiles` will call `.snapshotId()` without checking the snapshot, which can be null if the timestamp is newer than the current, resulting in a `NullPointerException` * The initial offset store checks for a future timestamp, but `planFiles` does not * The initial offset store handles null `fromTimestamp`, but `planFiles` does not * The `initialFutureStartOffset` creates an offset after the current snapshot, but not necessarily after the given future time I'm also attempting to fix those issues. This simplifies the code by removing several static helper methods. These assisted readability, but made assumptions about whether the table has a current snapshot and so were prone to `NullPointerExceptions`. Instead, this adds `determineStartingOffset` that is responsible for getting a starting offset or returning `StreamingOffset.START_OFFSET` if it cannot be determined because of the table state or a timestamp in the future. Now, `StreamingOffset.START_OFFSET` means that the job cannot start because the start offset has not been determined, and there is no "initial future offset". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
