[GitHub] [hudi] voonhous commented on a diff in pull request #7903: [HUDI-5734]Fix flink batch read skip clustering data lost

via GitHub Thu, 09 Feb 2023 00:24:10 -0800


voonhous commented on code in PR #7903:
URL: https://github.com/apache/hudi/pull/7903#discussion_r1101115877



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -380,7 +380,8 @@ private List<MergeOnReadInputSplit> buildFileIndex() {
             .path(FilePathUtils.toFlinkPath(path))
             .rowType(this.tableRowType)
             .maxCompactionMemoryInBytes(maxCompactionMemoryInBytes)
-            .requiredPartitions(getRequiredPartitionPaths()).build();
+            .requiredPartitions(getRequiredPartitionPaths())
+            .skipClustering(false).build();

Review Comment:
   Question, is it safer to hardcode a false into this under `BatchMode`? 
   
   IIUC, when reading in `batchMode`, `skip_clustering` should always be 
`false` and should not be configurable. 
   
   This is so as the reader will have an empty "state", i.e. no prior reads 
performed. As such, there will be no "previous" records that have been read. 
So, reading of reading of `replacecommits` must always be performed to ensure 
correctness.
   
   I suggest adding a comment to remind anyone modifying this part of the code 
in the future of **WHY** this `skip_clustering`  must be **false** here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] voonhous commented on a diff in pull request #7903: [HUDI-5734]Fix flink batch read skip clustering data lost

Reply via email to