LinMingQiang commented on code in PR #6520:
URL: https://github.com/apache/hudi/pull/6520#discussion_r958448326
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java:
##########
@@ -128,12 +128,14 @@ public Result inputSplits(
return Result.EMPTY;
}
+ // The value may be 'earliest' or Null or outOfRange.
final String startCommit =
this.conf.getString(FlinkOptions.READ_START_COMMIT);
- final String endCommit = this.conf.getString(FlinkOptions.READ_END_COMMIT);
Review Comment:
Now, our problem is that when read-start-commit is outOfRange, no matter
what the value of read.end-commit is set, the whole table will be scanned,
which causes some redundant data to be scanned and finally leads to incorrect
results. Therefore, I think it is necessary to scan the entire table only when
the ((start-commit value is null / outofrange / earliest) & & (end-commit value
is null / outofrange). (The current logic is: startFromEarliest ||
startOutOfRange || endOutOfRange)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]