voonhous commented on PR #19029: URL: https://github.com/apache/hudi/pull/19029#issuecomment-4757561607
Took a pass. Logic looks correct: the gated transitions are exactly the same-zone precision relabels plus long-to-local-timestamp, cross-zone stays rejected, and `shouldPromoteType` keeps the gate off so canonicalization can't undo a repair. A few things: 1. Main one: this relabels the logical type but never rescales the stored longs. That's right for the 0.x corruption you're targeting, but if someone enables it on a healthy `timestamp-micros` table and an incoming batch declares millis, every value silently reads 1000x off. The doc only describes what it permits - can we add a clear warning that values aren't rescaled, that it's a one-time migration aid, and that reads of existing files depend on the #14161 read repair? 2. Related: the gate is bidirectional, but the read-side repair only recovers toward millis (and long-to-local). The reverse directions (toward micros) have no read-side recovery, so old files keep the stale label while new ones don't, giving mixed-precision reads. Since the corruption is always "values are really millis," do we need the micros directions at all? Might be cleaner to only allow micros-to-millis, local-micros-to-local-millis, and long-to-local. 3. Test gap: nothing asserts that with the gate *open*, cross-zone (UTC vs local) is still rejected - that's the key invariant to pin. Worth an `assertThrows` in `testReconcileSchemaTimestampPrecisionEvolution`. Couple nits: - Description says "pre-existing overloads kept as delegates, so no public-API breakage," but the signatures actually changed in place (`reconcileSchema`, `isTypeUpdateAllow`, `ColumnUpdateChange.get(schema, caseSensitive)`). No real breakage since they're all `internal.schema.*` with no external callers, just worth fixing the wording. - Explicit `ALTER TABLE ... ALTER COLUMN TYPE` doesn't honor the flag (goes through the 1-arg `ColumnUpdateChange.get`, hardcoded false). Intentional? Fine either way, just undocumented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
