xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728743555



##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##########
@@ -119,6 +307,11 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
     addProjectionToJobConf(realtimeSplit, jobConf);
     LOG.info("Creating record reader with readCols :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
         + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
+
+    // for log only split, we no need parquet reader, set it to empty
+    if (FSUtils.isLogFile(realtimeSplit.getPath())) {

Review comment:
       agree

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -69,17 +70,18 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieRealtimeInputFormatUtils.class);
 
   public static InputSplit[] getRealtimeSplits(Configuration conf, 
Stream<FileSplit> fileSplits) throws IOException {
-    Map<Path, List<FileSplit>> partitionsToParquetSplits =
-        fileSplits.collect(Collectors.groupingBy(split -> 
split.getPath().getParent()));
+    // for all unique split parents, obtain all delta files based on delta 
commit timeline,
+    // grouped on file id
+    List<InputSplit> rtSplits = new ArrayList<>();
+    List<FileSplit> candidateFileSplits = 
fileSplits.collect(Collectors.toList());
+    Map<Path, List<FileSplit>> partitionsToParquetSplits = candidateFileSplits

Review comment:
       ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to