[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

GitBox Thu, 14 Oct 2021 03:03:52 -0700


danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728830538




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -161,6 +162,51 @@
     return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map<Path, List<FileSplit>> filterOutIncrementalSplits(
+      List<FileSplit> fileSplitList,
+      List<InputSplit> rtSplits,
+      final Option<HoodieVirtualKeyInfo> finalHoodieVirtualKeyInfo) {
+    return fileSplitList.stream().filter(s -> {
+      // deal with incremental query.
+      try {
+        if (s instanceof BaseFileWithLogsSplit) {
+          BaseFileWithLogsSplit bs = (BaseFileWithLogsSplit)s;
+          if (bs.getBelongToIncrementalSplit()) {
+            rtSplits.add(new HoodieRealtimeFileSplit(bs, bs.getBasePath(), 
bs.getDeltaLogPaths(), bs.getMaxCommitTime(), finalHoodieVirtualKeyInfo));
+          }
+        } else if (s instanceof RealtimeBootstrapBaseFileSplit) {
+          rtSplits.add(s);
+        }
+      } catch (IOException e) {
+        throw new HoodieIOException("Error creating hoodie real time split ", 
e);
+      }
+      // filter the snapshot split.
+      if (s instanceof RealtimeBootstrapBaseFileSplit) {
+        return false;
+      } else if ((s instanceof BaseFileWithLogsSplit) && 
((BaseFileWithLogsSplit) s).getBelongToIncrementalSplit()) {

Review comment:
       Why not just return early, i have pasted the code. And Why we need to 
handle the incremental query first, can we handle them together with snapshot 
query ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

Reply via email to