voonhous commented on code in PR #19029:
URL: https://github.com/apache/hudi/pull/19029#discussion_r3446161162
##########
hudi-common/src/test/java/org/apache/hudi/internal/schema/utils/TestAvroSchemaEvolutionUtils.java:
##########
@@ -563,8 +563,68 @@ public void
testNotEvolveSchemaIfReconciledSchemaUnchanged() {
InternalSchema oldInternalSchema =
InternalSchemaConverter.convert(oldSchema);
// set a non-default schema id for old table schema, e.g., 2.
oldInternalSchema.setSchemaId(2);
- InternalSchema evolvedSchema =
AvroSchemaEvolutionUtils.reconcileSchema(incomingSchema.getAvroSchema(),
oldInternalSchema, false);
+ InternalSchema evolvedSchema =
AvroSchemaEvolutionUtils.reconcileSchema(incomingSchema.getAvroSchema(),
oldInternalSchema, false, false);
// the evolved schema should be the old table schema, since there is no
type change at all.
Assertions.assertEquals(oldInternalSchema, evolvedSchema);
}
+
+ @Test
+ public void testReconcileSchemaTimestampPrecisionEvolution() {
Review Comment:
Could we add a gate-*open* `assertThrows` that cross-zone (UTC vs local) is
still rejected, e.g. `timestamp-micros` vs `local-timestamp-micros`? That's the
key invariant of the feature and nothing pins it right now.
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/SchemaChangeUtils.java:
##########
@@ -90,6 +96,18 @@ private static boolean isTypeUpdateAllowInternal(Type src,
Type dst) {
return isDecimalFixedUpdateAllowInternal(src, dst);
case STRING:
return dst == Types.DateType.get() || dst.typeId() ==
Type.TypeID.DECIMAL || dst.typeId() == Type.TypeID.DECIMAL_FIXED || dst ==
Types.BinaryType.get();
+ case TIMESTAMP:
Review Comment:
The gate is bidirectional, but the read-side repair only recovers toward
millis (and long-to-local). The reverse (toward micros) has no read-side
recovery, so old files keep the stale label and you get mixed-precision reads.
Since the corruption is always "values are really millis," could we drop the
micros directions and only allow micros-to-millis,
local-micros-to-local-millis, and long-to-local? A one-line "relabel only,
longs not rescaled" comment here would help too.
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java:
##########
@@ -83,6 +83,15 @@ public class HoodieCommonConfig extends HoodieConfig {
+ " operation will fail schema compatibility check. Set this option
to true will make the missing "
+ " column be filled with null values to successfully complete the
write operation.");
+ public static final ConfigProperty<Boolean>
ALLOW_TIMESTAMP_PRECISION_EVOLUTION = ConfigProperty
+ .key("hoodie.write.schema.allow.timestamp.precision.evolution")
+ .defaultValue(false)
+ .markAdvanced()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Controls whether schema evolution may change a
column between timestamp-millis and "
Review Comment:
This relabels the logical type but never rescales the stored longs. Worth
spelling that out in the doc: values aren't converted, it's a one-time
migration aid, and enabling it on a healthy `timestamp-micros` table where an
incoming batch declares millis will silently read every value 1000x off. Also
worth noting correct reads of existing files depend on the #14161 read repair.
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/action/TableChanges.java:
##########
@@ -232,11 +230,11 @@ protected Integer findIdByFullName(String fullName) {
}
public static ColumnUpdateChange get(InternalSchema schema) {
- return new ColumnUpdateChange(schema);
+ return new ColumnUpdateChange(schema, false, false);
}
- public static ColumnUpdateChange get(InternalSchema schema, boolean
caseSensitive) {
- return new ColumnUpdateChange(schema, caseSensitive);
+ public static ColumnUpdateChange get(InternalSchema schema, boolean
caseSensitive, boolean allowTimestampPrecisionEvolution) {
Review Comment:
Two small things here: (a) the description says pre-existing overloads were
kept as delegates, but this `get(schema, caseSensitive, ...)`,
`isTypeUpdateAllow`, and `reconcileSchema` actually changed signature in place
- no real breakage since they're all `internal.schema.*` with no external
callers, just worth fixing the wording. (b) the 1-arg `get(schema)` just below
stays hardcoded `false`, so explicit `ALTER TABLE ... ALTER COLUMN TYPE` won't
honor the flag. Intentional? Fine either way, just undocumented.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]