voonhous commented on PR #18961:
URL: https://github.com/apache/hudi/pull/18961#issuecomment-4678781999
Heads up on a latent bug inherited from the #18938 read path, found while
testing inference here and fixed in this PR:
`VariantReconstruction` never engaged on real files. The reader derives the
file schema by converting the parquet footer MessageType, and that conversion
loses the variant logical type, so the `getType() == VARIANT` check never
matched. The shredded base file was then read with the unshredded `{metadata,
value}` projection, silently dropping all typed_value fields on the AVRO read
path (a reconstructed row came back as just the residual, e.g.
`{"bad-key":false}`).
Fixed in this PR with shape-based detection anchored on the requested
column: the requested side (from the table schema, logical type intact) must be
a variant, and the on-disk side is matched by the shredded shape `{metadata:
bytes, value: [nullable] bytes, typed_value}`, so a plain footer-derived record
still triggers reconstruction.
Worth noting why #18938 did not catch it: the new COW inference test here is
the first to read a shredded base file end to end through the AVRO reader. The
existing MOR compaction test compacts a log-only file group (no shredded base
to read), and post-compaction queries go through the Spark native reader.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]