Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

via GitHub Thu, 26 Oct 2023 00:04:28 -0700


yihua commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1372608672



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -94,16 +94,17 @@ public Comparable getOrderingValue(Option<InternalRow> 
rowOption,
 
   @Override
   public HoodieRecord<InternalRow> constructHoodieRecord(Option<InternalRow> 
rowOption,
-                                                         Map<String, Object> 
metadataMap,
-                                                         Schema schema) {
+                                                         Map<String, Object> 
metadataMap) {
     if (!rowOption.isPresent()) {
       return new HoodieEmptyRecord<>(
           new HoodieKey((String) metadataMap.get(INTERNAL_META_RECORD_KEY),
               (String) metadataMap.get(INTERNAL_META_PARTITION_PATH)),
           HoodieRecord.HoodieRecordType.SPARK);
     }
 
+    Schema schema = (Schema) metadataMap.get(INTERNAL_META_SCHEMA);
     InternalRow row = rowOption.get();
+    boolean isPartial = (boolean) 
metadataMap.getOrDefault(INTERNAL_META_IS_PARTIAL, false);
     return new HoodieSparkRecord(row, 
HoodieInternalRowUtils.getCachedSchema(schema));

Review Comment:
   Good catch.  Somehow I missed this.  Fixed now.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -151,7 +151,7 @@ public HoodieFileGroupReader(HoodieReaderContext<T> 
readerContext,
   public void initRecordIterators() {
     this.baseFileIterator = baseFilePath.isPresent()
         ? readerContext.getFileRecordIterator(
-            baseFilePath.get().getHadoopPath(), start, length, 
readerState.baseFileAvroSchema, readerState.baseFileAvroSchema, hadoopConf)
+        baseFilePath.get().getHadoopPath(), start, length, 
readerState.baseFileAvroSchema, readerState.baseFileAvroSchema, hadoopConf)

Review Comment:
   Fixed.



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -51,6 +61,28 @@ class 
SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile =>
                                      requiredSchema: Schema,
                                      conf: Configuration): 
ClosableIterator[InternalRow] = {
     val fileInfo = 
sparkAdapter.getSparkPartitionedFileUtils.createPartitionedFile(partitionValues,
 filePath, start, length)
-    new CloseableInternalRowIterator(baseFileReader.apply(fileInfo))
+    if (filePath.toString.contains(HoodieLogFile.DELTA_EXTENSION)) {

Review Comment:
   This file path can be an InlineFS URL like 
`inlinefs://path/h3/.80646032-a18e-444f-a7ff-ae518cea8bdb-0_20231026055441030.log.1_0-257-368/file/?start_offset=672&length=2669`,
 which cannot be properly checked by `FsUtils.isLogFile`.  Should I fix 
`FsUtils.isLogFile`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

Reply via email to