[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

GitBox Mon, 25 Oct 2021 21:13:11 -0700


danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r736128496




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -161,6 +162,46 @@
     return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map<Path, List<FileSplit>> filterOutIncrementalSplits(
+      List<FileSplit> fileSplitList,

Review comment:
       Thanks, got your idea, the input param `fileSplits` comes from the 
scanning of Hive input format, so it should include all the files on the query 
path because `HoodieParquetRealtimeInputFormat#collectAllIncrementalFiles` 
generates the input splits from all the file slices.
   
   And the incremental query use the same code path here with the snapshot 
query, which complicates the code, maybe we can split the incremental query 
code out of the method `getRealtimeSplits` into a new method 
`getIncrementalRealtimeSplits` ? WDYT :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

Reply via email to