danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655226296



##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = 
partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> 
FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = 
groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> 
logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split 
", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is 
global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = 
groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> 
logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time 
split ", e);

Review comment:
       > When we use hive query the hudi table, the same NPE exception may also 
occur 
[here](https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L105)
 , should we add the same judgment logic?
   
   Oops, i guess we should




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to