Re: [PR] [HUDI-9602] Use BufferedRecordMerger for dedup and global index path [hudi]

via GitHub Mon, 04 Aug 2025 07:28:52 -0700


danny0405 commented on code in PR #13600:
URL: https://github.com/apache/hudi/pull/13600#discussion_r2251674083



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/RecordContext.java:
##########
@@ -39,39 +41,54 @@
 
 import javax.annotation.Nullable;
 
+import java.io.IOException;
 import java.io.Serializable;
 import java.util.List;
 import java.util.Map;
+import java.util.Properties;
 import java.util.function.BiFunction;
 
 import static 
org.apache.hudi.common.model.HoodieRecord.HOODIE_IS_DELETED_FIELD;
 import static 
org.apache.hudi.common.model.HoodieRecord.RECORD_KEY_METADATA_FIELD;
 
+/**
+ * Record context provides the APIs for record related operations. Record 
context is associated with
+ * a corresponding {@link HoodieReaderContext} and is used for getting field 
values from a record,
+ * transforming a record etc.
+ */
 public abstract class RecordContext<T> implements Serializable {
 
   private static final long serialVersionUID = 1L;
 
-  private SerializableBiFunction<T, Schema, String> recordKeyExtractor;
+  private final SerializableBiFunction<T, Schema, String> recordKeyExtractor;
   // for encoding and decoding schemas to the spillable map
   private final LocalAvroSchemaCache localAvroSchemaCache = 
LocalAvroSchemaCache.getInstance();
 
   protected JavaTypeConverter typeConverter;
-  protected String partitionPath;
+  protected String partitionPath = null;
 
-  public RecordContext(HoodieTableConfig tableConfig) {
+  protected RecordContext(HoodieTableConfig tableConfig) {
     this.typeConverter = new DefaultJavaTypeConverter();
-    updateRecordKeyExtractor(tableConfig, tableConfig.populateMetaFields());
-  }
-
-  public void updateRecordKeyExtractor(HoodieTableConfig tableConfig, boolean 
shouldUseMetadataFields) {
-    this.recordKeyExtractor = shouldUseMetadataFields ? metadataKeyExtractor() 
: virtualKeyExtractor(tableConfig.getRecordKeyFields()
+    this.recordKeyExtractor = tableConfig.populateMetaFields() ? 
metadataKeyExtractor() : virtualKeyExtractor(tableConfig.getRecordKeyFields()
         .orElseThrow(() -> new IllegalArgumentException("No record keys 
specified and meta fields are not populated")));
   }
 
   public void setPartitionPath(String partitionPath) {
     this.partitionPath = partitionPath;
   }
 
+  public T extractDataFromRecord(HoodieRecord record, Schema schema, 
Properties properties) {
+    try {
+      if (record.getData() instanceof HoodieRecordPayload) {

Review Comment:
   This is hacky, should get rid of it when Spark abadon the avro write path 
and `AvroBinaryRecord` is introduced. Can you create a JIRA to trace this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9602] Use BufferedRecordMerger for dedup and global index path [hudi]

Reply via email to