the-other-tim-brown opened a new issue, #17744: URL: https://github.com/apache/hudi/issues/17744
### Feature Description **What the feature achieves:** Hudi readers and writers should be able to handle datasets with variant types. This will allow users to work with semi-structured data more easily. **Why this feature is needed:** The variant type is now formally defined in Parquet and engines like Spark have full support for this type. Users with semi-structured data are otherwise forced to use strings or byte arrays to store this data. ### User Experience **How users will use this feature:** - Configuration changes needed - API changes - Usage examples The user will be able to add a Variant field to the schema and write data with this type using an engine like Spark where the data type is already supported. Readers should be able to return the variant type for engines that support it or otherwise return the underlying struct with the raw fields. The schema should be parseable by older Hudi readers for compatibility. ### Hudi RFC Requirements **RFC PR link:** (if applicable) This is an extension of RFC-99 which introduces the type system for Hudi. This will be the first type added that is not already supported in Avro. **Why RFC is/isn't needed:** - Does this change public interfaces/APIs? (Yes/No) - Does this change storage format? (Yes/No) - Justification: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
