Re: [I] Flink inference of Hudi unstructured logical types (Variant, Blob, Vector) from Parquet [hudi]

via GitHub Mon, 11 May 2026 11:53:12 -0700


kbuci commented on issue #18711:
URL: https://github.com/apache/hudi/issues/18711#issuecomment-4423834271


   @cshuo 
   > The Hudi read path should use the writer/table HoodieSchema to read the 
file and then project/convert to the requested schema. 
   
   Thanks for the clarification! - when I first encountered this issue my 
thought process was to just check if we can make sure generally all HUDI  Flink 
read/write code paths "infer" with the HoodieSchema. But I think we should 
instead follow the model you highlighted - this schema inference logic should 
only need to happen at "read" time. Since thats anyway the current precedent in 
Spark and Flink. 
   I'm updating my existing Flink variant PR to go with this approach (which is 
still similar to approach A in essence). 
   https://github.com/apache/hudi/pull/18539 
   
   Since once this is wired-up, Blob and Vector implementations should be 
easier to understand/add


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Flink inference of Hudi unstructured logical types (Variant, Blob, Vector) from Parquet [hudi]

Reply via email to