danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728830538



##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -161,6 +162,51 @@
     return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map<Path, List<FileSplit>> filterOutIncrementalSplits(
+      List<FileSplit> fileSplitList,
+      List<InputSplit> rtSplits,
+      final Option<HoodieVirtualKeyInfo> finalHoodieVirtualKeyInfo) {
+    return fileSplitList.stream().filter(s -> {
+      // deal with incremental query.
+      try {
+        if (s instanceof BaseFileWithLogsSplit) {
+          BaseFileWithLogsSplit bs = (BaseFileWithLogsSplit)s;
+          if (bs.getBelongToIncrementalSplit()) {
+            rtSplits.add(new HoodieRealtimeFileSplit(bs, bs.getBasePath(), 
bs.getDeltaLogPaths(), bs.getMaxCommitTime(), finalHoodieVirtualKeyInfo));
+          }
+        } else if (s instanceof RealtimeBootstrapBaseFileSplit) {
+          rtSplits.add(s);
+        }
+      } catch (IOException e) {
+        throw new HoodieIOException("Error creating hoodie real time split ", 
e);
+      }
+      // filter the snapshot split.
+      if (s instanceof RealtimeBootstrapBaseFileSplit) {
+        return false;
+      } else if ((s instanceof BaseFileWithLogsSplit) && 
((BaseFileWithLogsSplit) s).getBelongToIncrementalSplit()) {

Review comment:
       Why not just return early, i have pasted the code. And Why we need to 
handle the incremental query first, can we handle them together with snapshot 
query ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to