Re: [PR] fix: timestamp logical types [hudi]

via GitHub Mon, 06 Apr 2026 20:14:40 -0700


nsivabalan commented on code in PR #18132:
URL: https://github.com/apache/hudi/pull/18132#discussion_r3042656464



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -86,7 +87,8 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
     HoodieFileReader bootstrapFileReader = null;
 
     Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
-    Schema readerSchema = baseFileReader.getSchema();
+    Schema readerSchema = 
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);

Review Comment:
   +1 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSchemaUtils.scala:
##########
@@ -234,4 +234,18 @@ object HoodieSchemaUtils {
       (newSchema, isSchemaCompatible(tableSchema, newSchema))
     }
   }
+
+  /**
+   * Evaluates whether timestamp millis field is present in schema. This is 
required for repairing the schema.
+   *
+   * @param schema Avro schema to evaluate
+   * @return true if schema is null or schema contains timestamp millis field
+   */
+  def hasTimestampMillisField(schema: Schema): Boolean = {
+    if (schema == null) {
+      true

Review Comment:
   why do we return true if the schema is null? 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergedReadHandle.java:
##########
@@ -59,9 +60,12 @@ public HoodieMergedReadHandle(HoodieWriteConfig config,
                                 HoodieTable<T, I, K, O> hoodieTable,
                                 Pair<String, String> partitionPathFileIDPair) {
     super(config, instantTime, hoodieTable, partitionPathFileIDPair);
-    readerSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(config.getSchema()), 
config.allowOperationMetadataField());
+    Schema orignalReaderSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(config.getSchema()), 
config.allowOperationMetadataField());
     // config.getSchema is not canonicalized, while config.getWriteSchema is 
canonicalized. So, we have to use the canonicalized schema to read the existing 
data.
     baseFileReaderSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(config.getWriteSchema()), 
config.allowOperationMetadataField());
+    // Repair reader schema.

Review Comment:
   lets fix this. should not take much effort to fix. 
   lets fix 0.15.1 if need be. 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -86,7 +87,8 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
     HoodieFileReader bootstrapFileReader = null;
 
     Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
-    Schema readerSchema = baseFileReader.getSchema();
+    Schema readerSchema = 
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);

Review Comment:
   can you help me understand, why are we not trying to infer 
`isLogicalTimestampRepairEnabled` in the executor code? 
   isn't the idea to parse the schema once in the driver and infer it in the 
executor and avoid schema repair code altogether if table does not contain any 
logical types at all 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: timestamp logical types [hudi]

Reply via email to