yihua commented on code in PR #14120:
URL: https://github.com/apache/hudi/pull/14120#discussion_r2443875319
##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##########
@@ -1208,7 +1226,9 @@ public static Object rewritePrimaryType(Object oldValue,
Schema oldSchema, Schem
return oldValue;
case LONG:
if (oldSchema.getLogicalType() != newSchema.getLogicalType()) {
- if (oldSchema.getLogicalType() instanceof
LogicalTypes.TimestampMillis) {
+ if (skipLogicalTimestampEvolution || oldSchema.getLogicalType() ==
null || newSchema.getLogicalType() == null) {
Review Comment:
However, for handling the timestamp issue this PR addresses, the ingestion
writer needs to rewrite the schema from timestamp micros to millis.
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormatHelper.scala:
##########
@@ -171,8 +234,14 @@ object HoodieParquetFileFormatHelper {
}
}
- if (typeChangeInfos.isEmpty) {
+ if (typeChangeInfos.isEmpty && columnsToMultiply.isEmpty) {
Review Comment:
Does this mean that the record-level projection overhead is only incurred if
there are columns to apply multiplication?
##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##########
@@ -1208,7 +1226,9 @@ public static Object rewritePrimaryType(Object oldValue,
Schema oldSchema, Schem
return oldValue;
case LONG:
if (oldSchema.getLogicalType() != newSchema.getLogicalType()) {
- if (oldSchema.getLogicalType() instanceof
LogicalTypes.TimestampMillis) {
+ if (skipLogicalTimestampEvolution || oldSchema.getLogicalType() ==
null || newSchema.getLogicalType() == null) {
Review Comment:
Based on my understanding, previously,
`AvroSchemaCompatibility#calculateCompatibility` does not validate the logical
timestamp evolution before, so timestamp micros to timestamp millis can happen
which leads to precision loss, and such schema evolution should not be allowed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]