alamb opened a new issue, #8154:
URL: https://github.com/apache/arrow-rs/issues/8154

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   - part of https://github.com/apache/arrow-rs/issues/6736
   
   **Note this is likely one of the most complex parts of implementing Shredded 
Variants , so it is not a good first task**
   
   We are trying to support the general case of the `variant_get` function,  
which  allows runtime dynamic access to Variants (either shredded or 
unshredded). 
   
   - We found in  https://github.com/apache/arrow-rs/issues/8083 that 
supporting variant_get is quite complicated (see 
[here](https://github.com/apache/arrow-rs/pull/8122#discussion_r2277736911)), 
so we are proposing to brake it down into multiple piece.
   
   **This ticket tracks**
   Support `variant_get` for `Some(DataType::VARIANT)`
   
   The idea here is that the user could reconstruct an unshredded Variant from 
any input Variant (either Shredded or Unshredded)
   
   Implementing this functionality will likely require the basic representation 
for shredded Variant arrays along with path traversal in `variant_get`. 
However, it does **NOT** cover the following (which are / will be broken into 
separate tickets)
   - Support for retrieving as a specific non Struct data type (e.g. 
`Some(DataType::Utf8)`)
   - Retrieving any arbitrary path and returning what is there (no type 
specified)
   - Retrieving an arbitrary path as a "Struct" (aka implementing shredding)
   
   
   **Describe the solution you'd like**
   @scovich  sketched out a high level design for Shredded Objects (see 
[Representing Variant In Arrow Proposal: "Shredding an 
Object"](https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?tab=t.0#heading=h.wediefuitb91)
 and [Variant 
Shredding::Objects](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects))
 in this PR
   - https://github.com/apache/arrow-rs/pull/8122
   
   This likely requires reusing some of the logic in the `cast_to_variant`  
kernel to convert typed columns into Variants
   
   So roughly that means supporting
   ```rust
   // get the named field of variant object as a typed field 
   variant_get(array, "$.field_name", Variant)
   ```
   
   Where `$.field_name` represents some arbitrary `VariantPath` such as `a` for 
field "a", or `a.b` for field "b" of field "a"
   
   
   This should work for:
   1. Variants where the field_name is in a typed_value 
   2. Variants where the field_name is not in the typed value
   
   **Describe alternatives you've considered**
   1. Add a test that manually constructs a shredded variant array (follow the 
example in the arrow proposal)
   2. Add a test that calls variant_get appropriately
   3. Implement the code
   
   I suggest getting this working for non-nested obejcts first, and then 
working on nesting / pathing as a second pR
   
   
   **Additional context**
   
   
   
   Reference
   - [Variant 
Spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types)
   - [Variant Shredding 
Spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#value-shredding)
   - [Representing Variant In Arrow 
Proposal](https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to