Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

via GitHub Fri, 13 Mar 2026 01:42:40 -0700


voonhous commented on PR #17833:
URL: https://github.com/apache/hudi/pull/17833#issuecomment-4053605308


   > @rahil-c already asked about Trino/Presto/Hive in the RFC review. If 
someone writes a Variant column from Spark and then tries to read the table 
from Hive or Trino today, what happens? Does it blow up at the schema inference 
step? Does it silently surface as a struct of two binary columns? This should 
be documented, even if the answer is "unsupported for now" — otherwise users 
are going to file bugs when they hit it in production.
   
   As long as hivesync passes, it will be a record of 2 byte fields. So, users 
will need some sort of byte -> base64 operation to see their data. It does not 
blow up. The path for reading records and bytes should be well tested and 
mature. So, nothing should happen. 
   
   There are FT in Spark3.5 testing backward compatibility, i.e. table written 
in Spark4.0 with variant and reading them back out to simulate this.
   
   On top of that, this is documented in the RFC-99. Can you please be more 
specific in where you want this to be documented? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

Reply via email to