voonhous commented on PR #19029: URL: https://github.com/apache/hudi/pull/19029#issuecomment-4757665668
Two more correctness things after a closer look, both about the fix being narrower/less durable than it first looks: 1. Durability across engines. This corrects the table schema, but existing parquet files still carry the old `timestamp-micros` label with millis values - only Hudi's own reader applies the #14161 repair. External readers (Trino/Presto, Athena, BigQuery external, Spark-native parquet) will misread those files until they're physically rewritten. So the fix is only durable for non-Hudi engines after the affected base files get rewritten via clustering/compaction under the corrected schema. Worth calling out in the doc, with "rewrite the files" as the second migration step. 2. The flag is ignored on some write paths. Tracing `deduceWriterSchema`: it's honored on the internal-schema reconcile path (schema-on-read / `saveInternalSchema`), the MOR merge handles, and the Spark default path only when `setNullForMissingColumns=true`. It's *not* consulted for `MERGE INTO` writes or when `hoodie.avro.schema.validate=true` (both take the else branch in `deduceWriterSchemaWithoutReconcile`, which doesn't reconcile), nor on the legacy `reconcileSchemasLegacy` path (uses `shouldPromoteType`, hardcoded false). Consistent with the PR's scope, but a user who sets the flag and writes via `MERGE INTO` will see no effect and no error. Might be worth documenting which paths honor it, and whether `MERGE INTO` needs covering - the current tests only hit the sync path. Lower priority, didn't verify: column stats / data-skipping bounds for the relabeled column were computed under the old logical type - worth a sanity check that pruning stays correct after the relabel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
