voonhous commented on code in PR #7903:
URL: https://github.com/apache/hudi/pull/7903#discussion_r1101115877
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -380,7 +380,8 @@ private List<MergeOnReadInputSplit> buildFileIndex() {
.path(FilePathUtils.toFlinkPath(path))
.rowType(this.tableRowType)
.maxCompactionMemoryInBytes(maxCompactionMemoryInBytes)
- .requiredPartitions(getRequiredPartitionPaths()).build();
+ .requiredPartitions(getRequiredPartitionPaths())
+ .skipClustering(false).build();
Review Comment:
Question, is it safer to hardcode a false into this under `BatchMode`?
IIUC, when reading in `batchMode`, `skip_clustering` should always be
`false` and should not be configurable.
This is so as the reader will have an empty "state", i.e. no prior reads
performed. As such, there will be no "previous" records that have been read.
So, reading of reading of `replacecommits` must always be performed to ensure
correctness.
I suggest adding a comment to remind anyone modifying this part of the code
in the future of **WHY** this `skip_clustering` must be **false** here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]