alamb commented on issue #7941:
URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090534233

   @alamb in https://github.com/apache/arrow-rs/pull/7915#discussion_r2203360483
   
   > Shredded fields need a full blown variant builder, because they're 
strongly typed and we need to encode them as variant. 
   
   > When unshredding, we need to turn a, b, and x from strongly typed values 
into variant objects -- the latter recursively so -- which requires a variant 
builder. The recursive unshredding of j also requires a builder of its own.
   
   I see -- you are imagining unshredding the varaint *phiscally* (aka forming 
the bytes for a new unshredded variant `value` using a `VariantBuilder`) to 
access the shredded fields. 
   
   What if we made a *view* into the variant so we didn't have to copy anything 
to read the Variant? If we could get a `Variant` that accessed the shredded 
fields, physically unshredding becomes an exercise in creating a new `builder` 
and calling `append_value`:
   
   ### Hypothetical `unshred` kernel
   
   ```rust
   // Given a shredded variant that is itself a Variant and a view on the 
shredded columns 
   fn unshred(shredded_variant: Varaint) -> (Vec<u8>, Vec<u8>) {
     // deep copy the shredded variant
     let mut variant_builder = VariantBuilder::new()
     variant_builder.append_value(shredded_variant)
   }
   ```
   
   ## Shredded Variants as "Views"
   The question then is how to implement a view.  Accessing primitive types as 
a Variant is straightforward
   
   ```rust
   fn shredded_int_as_variant(value: i64) -> Variant {
     Varant::from(value)
   }
   // ... other primitive types here ..
   fn shredded_str_as_variant(value: &str) -> Variant {
     Variant::from(value)
   }
   ```
   
   The tricky bit is accessing partially shredded objects.  Maybe we could add 
the shredded fields  to `VariantObject`, something like
   
   ```rust
   struct VariantObject {
    // ... existing fields ..
    // fields that were shredded and live elsewhere
    shredded_fields: IndexMap<&str, Variant>
   }
   ```
   
   Then we can update all the accessor methods, etc. 
   
   An alternate mechanism that might be more explicit would a new explicit 
enumeration in `Variant` like
   
   ```rust
   struct ShreddedObject {
     object: VariantObject,
     shredded_fields: IndexMap<&str, Variant>
   }
   
   enum Variant {
     ShreddedObject(..)
   }
   ```
   
   We could `Box` the shredded fields to keep the size of Variant down if needed
   
   ##  Hows does this handle the example? 
   
   
   > Imagine a partially shredded variant column `v`:
   > 
   > * Fields `a`, `m` and `x` live in the `typed_value` column as a perfectly 
shredded struct
   >   
   >   * Furthermore, `x` is itself a shredded struct with its own fields `i` 
and `j`
   >     
   >     * Furthermore, `i` is a variant column (also using the same top-level 
metadata)
   >     * Furthermore, `j` is a partially shredded struct
   > * Fields `b`, `n`, and `y` live in the `value` column as a variant object
   
   
   I think the view approach could handle this example. For example
   
   * `v` is a `Variant::Object(..)` with `shredded_fields`
   *   `v` 's fields in the `value_column` contain `b`, `n` and `y`
   *  `v::shredded_fields` contains the `Variant`s for `a`, `m`, and `x`  
   *     `x` is a `VariantObject()` ...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to