nsivabalan commented on code in PR #18132:
URL: https://github.com/apache/hudi/pull/18132#discussion_r3042656464
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -86,7 +87,8 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
HoodieFileReader bootstrapFileReader = null;
Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
- Schema readerSchema = baseFileReader.getSchema();
+ Schema readerSchema =
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);
Review Comment:
+1
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSchemaUtils.scala:
##########
@@ -234,4 +234,18 @@ object HoodieSchemaUtils {
(newSchema, isSchemaCompatible(tableSchema, newSchema))
}
}
+
+ /**
+ * Evaluates whether timestamp millis field is present in schema. This is
required for repairing the schema.
+ *
+ * @param schema Avro schema to evaluate
+ * @return true if schema is null or schema contains timestamp millis field
+ */
+ def hasTimestampMillisField(schema: Schema): Boolean = {
+ if (schema == null) {
+ true
Review Comment:
why do we return true if the schema is null?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergedReadHandle.java:
##########
@@ -59,9 +60,12 @@ public HoodieMergedReadHandle(HoodieWriteConfig config,
HoodieTable<T, I, K, O> hoodieTable,
Pair<String, String> partitionPathFileIDPair) {
super(config, instantTime, hoodieTable, partitionPathFileIDPair);
- readerSchema = HoodieAvroUtils.addMetadataFields(new
Schema.Parser().parse(config.getSchema()),
config.allowOperationMetadataField());
+ Schema orignalReaderSchema = HoodieAvroUtils.addMetadataFields(new
Schema.Parser().parse(config.getSchema()),
config.allowOperationMetadataField());
// config.getSchema is not canonicalized, while config.getWriteSchema is
canonicalized. So, we have to use the canonicalized schema to read the existing
data.
baseFileReaderSchema = HoodieAvroUtils.addMetadataFields(new
Schema.Parser().parse(config.getWriteSchema()),
config.allowOperationMetadataField());
+ // Repair reader schema.
Review Comment:
lets fix this. should not take much effort to fix.
lets fix 0.15.1 if need be.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -86,7 +87,8 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
HoodieFileReader bootstrapFileReader = null;
Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
- Schema readerSchema = baseFileReader.getSchema();
+ Schema readerSchema =
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);
Review Comment:
can you help me understand, why are we not trying to infer
`isLogicalTimestampRepairEnabled` in the executor code?
isn't the idea to parse the schema once in the driver and infer it in the
executor and avoid schema repair code altogether if table does not contain any
logical types at all
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]