stevenzwu commented on code in PR #4329:
URL: https://github.com/apache/iceberg/pull/4329#discussion_r844261439
##########
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/source/ScanContext.java:
##########
@@ -114,77 +128,104 @@ private ScanContext(boolean caseSensitive, Long
snapshotId, Long startSnapshotId
this.includeColumnStats = includeColumnStats;
this.exposeLocality = exposeLocality;
this.planParallelism = planParallelism;
+
+ validate();
+ }
+
+ private void validate() {
+ if (isStreaming) {
+ if (startingStrategy ==
StreamingStartingStrategy.SPECIFIC_START_SNAPSHOT_ID) {
Review Comment:
Will the `startSnapshotId` or `startSnapshotTimestamp`, there is no regular
table scan. Say they map to some snapshot id X. With the latest agreement on
inclusive behavior for the starting snapshot, we will start the incremental
scan btw (X's parent snapshot id, Y]. Previous/current implementation has the
start snapshot as exclusive, which is the same behavior as the current source
impl.
Only `TABLE_SCAN_THEN_INCREMENTAL` does a regular table scan and then start
the incremental discovery process.
right now, I didn't add extensive/strict validations. We can't do some of
the validations. E.g., `endSnapshotId` will be added in the copied
`ScanContext`. Validation on `endSnapshotId` will fail. I guess we can fix the
copy method and unset the streaming related fields like `isStreaming`.
```
ScanContext incrementalScan = scanContext
.copyWithAppendsBetween(lastPosition.endSnapshotId(),
currentSnapshot.snapshotId());
```
Regarding the end condition, it might be useful for a future
`endSnapshotTimestamp`. A past `endSnapshotTimestamp` probably doesn't make
sense, since we can just do a regular batch scan with filters.
A `endSnapshotId` in the future also doesn't make sense, since we won't know
it when the streaming read starts. A `endSnapshotId` in the past should be
just a regular batch scan and hence is not applicable to streaming mode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]