voonhous commented on PR #17833: URL: https://github.com/apache/hudi/pull/17833#issuecomment-4053605308
> @rahil-c already asked about Trino/Presto/Hive in the RFC review. If someone writes a Variant column from Spark and then tries to read the table from Hive or Trino today, what happens? Does it blow up at the schema inference step? Does it silently surface as a struct of two binary columns? This should be documented, even if the answer is "unsupported for now" — otherwise users are going to file bugs when they hit it in production. As long as hivesync passes, it will be a record of 2 byte fields. So, users will need some sort of byte -> base64 operation to see their data. It does not blow up. The path for reading records and bytes should be well tested and mature. So, nothing should happen. There are FT in Spark3.5 testing backward compatibility, i.e. table written in Spark4.0 with variant and reading them back out to simulate this. On top of that, this is documented in the RFC-99. Can you please be more specific in where you want this to be documented? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
