alamb opened a new issue, #8153: URL: https://github.com/apache/arrow-rs/issues/8153
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - part of https://github.com/apache/arrow-rs/issues/6736 **Note this is likely one of the most complex parts of implementing Shredded Variants , so it is not a good first task** We are trying to support the general case of the `variant_get` function, which allows runtime dynamic access to Variants (either shredded or unshredded). - We found in https://github.com/apache/arrow-rs/issues/8083 that supporting variant_get is quite complicated (see [here](https://github.com/apache/arrow-rs/pull/8122#discussion_r2277736911)), so we are proposing to brake it down into multiple piece. **This ticket tracks** Support `variant_get` for `Some(DataType::Struct)` (nested shredding) The idea here is that the user would specify a "shredding schema" (similar to what @friendlymatthew is sketching out in https://github.com/apache/arrow-rs/pull/7921) and the variant_get kernel would produce a `VariantArray` with the defined schema, extracting fields as necessary Implementing this functionality will likely require the basic representation for shredded Variant arrays along with path traversal in `variant_get`. However, it does **NOT** cover the following (which are / will be broken into separate tickets) - Support for retrieving as a specific non Struct data type (e.g. `Some(DataType::Utf8)`) - Retrieving any arbitrary path and returning what is there (no type specified) - Retrieving any arbitrary path as a Variant (aka "unshredding") **Describe the solution you'd like** @scovich sketched out a high level design for Shredded Objects (see [Representing Variant In Arrow Proposal: "Shredding an Object"](https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?tab=t.0#heading=h.wediefuitb91) and [Variant Shredding::Objects](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects)) in this PR - https://github.com/apache/arrow-rs/pull/8122 This likely requires the ability to create variants without modifying the metadata - https://github.com/apache/arrow-rs/issues/8152 So roughly that means supporting ```rust // get the named field of variant object as a typed field variant_get(array, "$.field_name", DataType::Struct <....>) ``` Where `$.field_name` represents some arbitrary `VariantPath` such as `a` for field "a", or `a.b` for field "b" of field "a" And DataType::Struct is a "shredding schema" that reflects both value and typed_value This should work for: 1. Variants where the field_name is in a typed_value 2. Variants where the field_name is not in the typed value **Describe alternatives you've considered** 1. Add a test that manually constructs a shredded variant array (follow the example in the arrow proposal) 2. Add a test that calls variant_get appropriately 3. Implement the code I suggest getting this working for non-nested obejcts first, and then working on nesting / pathing as a second pR **Additional context** Reference - [Variant Spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types) - [Variant Shredding Spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#value-shredding) - [Representing Variant In Arrow Proposal](https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org