the-other-tim-brown opened a new issue, #17744:
URL: https://github.com/apache/hudi/issues/17744

   ### Feature Description
   
   **What the feature achieves:**
   Hudi readers and writers should be able to handle datasets with variant 
types. This will allow users to work with semi-structured data more easily.
   
   **Why this feature is needed:**
   The variant type is now formally defined in Parquet and engines like Spark 
have full support for this type. Users with semi-structured data are otherwise 
forced to use strings or byte arrays to store this data.
   
   
   ### User Experience
   
   **How users will use this feature:**
   - Configuration changes needed
   - API changes
   - Usage examples
   
   The user will be able to add a Variant field to the schema and write data 
with this type using an engine like Spark where the data type is already 
supported.
   
   Readers should be able to return the variant type for engines that support 
it or otherwise return the underlying struct with the raw fields.
   
   The schema should be parseable by older Hudi readers for compatibility.
   
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (if applicable)
   This is an extension of RFC-99 which introduces the type system for Hudi. 
This will be the first type added that is not already supported in Avro.
   
   **Why RFC is/isn't needed:**
   - Does this change public interfaces/APIs? (Yes/No)
   - Does this change storage format? (Yes/No)
   - Justification:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to