[GitHub] [hudi] nsivabalan commented on a diff in pull request #6254: [HUDI-4508] Repair the exception when reading optimized query for mor in hive and presto/trino

GitBox Thu, 04 Aug 2022 18:41:49 -0700


nsivabalan commented on code in PR #6254:
URL: https://github.com/apache/hudi/pull/6254#discussion_r938374940



##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java:
##########
@@ -189,11 +190,16 @@ protected List<FileStatus> 
listStatusForIncrementalMode(JobConf job,
     return HoodieInputFormatUtils.filterIncrementalFileStatus(jobContext, 
tableMetaClient, timeline.get(), fileStatuses, commitsToCheck.get());
   }
 
-  protected FileStatus createFileStatusUnchecked(FileSlice fileSlice, 
HiveHoodieTableFileIndex fileIndex, Option<HoodieVirtualKeyInfo> 
virtualKeyInfoOpt) {
+  protected Option<? extends FileStatus> createFileStatusUnchecked(FileSlice 
fileSlice, HiveHoodieTableFileIndex fileIndex, Option<HoodieVirtualKeyInfo> 
virtualKeyInfoOpt) {
     Option<HoodieBaseFile> baseFileOpt = fileSlice.getBaseFile();
+    Option<HoodieLogFile> latestLogFileOpt = fileSlice.getLatestLogFile();
 
     if (baseFileOpt.isPresent()) {
       return getFileStatusUnchecked(baseFileOpt.get());
+    } else if (latestLogFileOpt.isPresent()) {
+      // It happens when reading optimized query to mor.
+      LOG.info("File slice(" + fileSlice.getFileId() + ") has no base-file but 
log-file. Skip the slice.");
+      return Option.empty();

Review Comment:
   with hbase index, its feasible to have only log files w/o a base file on 
some cases. its also the case for kafka connect. So, its valid case even for 
snapshot query if I am not wrong. so, looks like this fix was needed anyways 
even if not for read optimized query.
   @alexeykudinkin @yihua 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6254: [HUDI-4508] Repair the exception when reading optimized query for mor in hive and presto/trino

Reply via email to