Re: [I] Add low level support for shredding and unshredding [arrow-rs]

via GitHub Thu, 19 Jun 2025 17:17:47 -0700


alamb commented on issue #7715:
URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-2989456380


   Here is how I was imagining Variant support looks like:
   1. To arrow-rs , arrays that hold variant will be `StructArray`s with at 
least two field: `metadata`, and `value` of binary (or maybe binary dictionary 
/ RLE) type. This mirrrors the parquet physical representation
   2. The fact that the column is a Variant is represented using the `Metadata` 
on a field (as an extension type)
   3. Arrays that hold "shredded" variant values will be `StructArray`s with 
the two `metadata` and `value` columns and the additional shredded columns
   4. Higher level engines (like DataFusion) will need to translate requests to 
extract a field from a variant object to the appropriate APIs on those 
StructArrays
   5. We will need some new kernels (probably living in `parquet-variant`) 
   
   SOme potential new kernels
   1.   Extract a field of a Variant column (annotated StructArray) into a new 
array (`variant_extract` as proposed by @scovich  perhaps)
   8. (maybe) add a new field/fields to a Variant column
   9. json value -->variant (annotated StructArray)
   10. variant --> json 
   11. variant_to_struct (an extension of `variant_extract` perhaps to cast 
to/from variant and a (real) `StructArray`
   
   
   I'll try and work up an example / diagram of how this would work over the 
next few days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Add low level support for shredding and unshredding [arrow-rs]

Reply via email to