xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r733458463



##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -161,6 +162,46 @@
     return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map<Path, List<FileSplit>> filterOutIncrementalSplits(
+      List<FileSplit> fileSplitList,

Review comment:
       @danny0405 
   I remembered why I did this.
   
   we cannot read incremental datas which before replacecommit.
   
   line 122, return latest fileSlices
   
   line 134 we use those latest fileSlices to filter the queried inputslit.
   
   think that now our table has 3 commits:
   commit1    file:  (file1)
   commit2    file:  (file2)
   replacecommit  file: (file3)
   
   now we want to query the incremental data of commit2 (file2 will be picked 
as inputSplit);
   
   line122, will give us a wrong fileSlices,  (this fileSlices only contains 
file3) 
   line 134,  when we do filter by using those fileSlices, all the incremental 
inputSplit will be exclueded.  since incremental inputSplit is file2, but 
fileSlices only contains file3. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to