danny0405 commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1368210408


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##########
@@ -195,6 +206,10 @@ public HoodieKey getKey() {
     return key;
   }
 
+  public boolean isPartial() {
+    return isPartial;

Review Comment:
   Not sure why we store `isPartial` flag in row-level, from high level, the 
`isPartial` should at lest to be with glanularity of commit/a write batch level.



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java:
##########
@@ -67,6 +70,7 @@ public abstract class HoodieReaderContext<T> {
    * file.
    *
    * @param filePath       {@link Path} instance of a file.
+   * @param isLogFile      Whether this is a log file.
    * @param start          Starting byte to start reading.

Review Comment:
   Can we infer the `isLogFile` flag directly from the file extension?



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -94,17 +94,18 @@ public Comparable getOrderingValue(Option<InternalRow> 
rowOption,
 
   @Override
   public HoodieRecord<InternalRow> constructHoodieRecord(Option<InternalRow> 
rowOption,
-                                                         Map<String, Object> 
metadataMap,
-                                                         Schema schema) {
+                                                         Map<String, Object> 
metadataMap) {
     if (!rowOption.isPresent()) {
       return new HoodieEmptyRecord<>(
           new HoodieKey((String) metadataMap.get(INTERNAL_META_RECORD_KEY),
               (String) metadataMap.get(INTERNAL_META_PARTITION_PATH)),
           HoodieRecord.HoodieRecordType.SPARK);
     }
 
+    Schema schema = (Schema) metadataMap.get(INTERNAL_META_SCHEMA);
     InternalRow row = rowOption.get();
-    return new HoodieSparkRecord(row, 
HoodieInternalRowUtils.getCachedSchema(schema));
+    boolean isPartial = (boolean) 
metadataMap.getOrDefault(INTERNAL_META_IS_PARTIAL, false);
+    return new HoodieSparkRecord(row, 
HoodieInternalRowUtils.getCachedSchema(schema), isPartial);

Review Comment:
   Can you explain in high level why we need to know a record comes from 
partial update or not? Isn't the partial update attribute of commit?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##########
@@ -126,12 +128,13 @@ protected Option<T> doProcessNextDataRecord(T record,
       // Merge and store the combined record
       // Note that the incoming `record` is from an older commit, so it should 
be put as
       // the `older` in the merge API
+
       HoodieRecord<T> combinedRecord = (HoodieRecord<T>) recordMerger.merge(
-          readerContext.constructHoodieRecord(Option.of(record), metadata, 
readerSchema),
-          readerSchema,
+          readerContext.constructHoodieRecord(Option.of(record), metadata),
+          (Schema) metadata.get(INTERNAL_META_SCHEMA),
           readerContext.constructHoodieRecord(
-              existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight(), readerSchema),
-          readerSchema,
+              existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight()),
+          (Schema) 
existingRecordMetadataPair.getRight().get(INTERNAL_META_SCHEMA),
           payloadProps).get().getLeft();

Review Comment:
   Why not just initialize the `readerSchema` with `(Schema) 
metadata.get(INTERNAL_META_SCHEMA)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to