alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090534233
@alamb in https://github.com/apache/arrow-rs/pull/7915#discussion_r2203360483 > Shredded fields need a full blown variant builder, because they're strongly typed and we need to encode them as variant. > When unshredding, we need to turn a, b, and x from strongly typed values into variant objects -- the latter recursively so -- which requires a variant builder. The recursive unshredding of j also requires a builder of its own. I see -- you are imagining unshredding the varaint *phiscally* (aka forming the bytes for a new unshredded variant `value` using a `VariantBuilder`) to access the shredded fields. What if we made a *view* into the variant so we didn't have to copy anything to read the Variant? If we could get a `Variant` that accessed the shredded fields, physically unshredding becomes an exercise in creating a new `builder` and calling `append_value`: ### Hypothetical `unshred` kernel ```rust // Given a shredded variant that is itself a Variant and a view on the shredded columns fn unshred(shredded_variant: Varaint) -> (Vec<u8>, Vec<u8>) { // deep copy the shredded variant let mut variant_builder = VariantBuilder::new() variant_builder.append_value(shredded_variant) } ``` ## Shredded Variants as "Views" The question then is how to implement a view. Accessing primitive types as a Variant is straightforward ```rust fn shredded_int_as_variant(value: i64) -> Variant { Varant::from(value) } // ... other primitive types here .. fn shredded_str_as_variant(value: &str) -> Variant { Variant::from(value) } ``` The tricky bit is accessing partially shredded objects. Maybe we could add the shredded fields to `VariantObject`, something like ```rust struct VariantObject { // ... existing fields .. // fields that were shredded and live elsewhere shredded_fields: IndexMap<&str, Variant> } ``` Then we can update all the accessor methods, etc. An alternate mechanism that might be more explicit would a new explicit enumeration in `Variant` like ```rust struct ShreddedObject { object: VariantObject, shredded_fields: IndexMap<&str, Variant> } enum Variant { ShreddedObject(..) } ``` We could `Box` the shredded fields to keep the size of Variant down if needed ## Hows does this handle the example? > Imagine a partially shredded variant column `v`: > > * Fields `a`, `m` and `x` live in the `typed_value` column as a perfectly shredded struct > > * Furthermore, `x` is itself a shredded struct with its own fields `i` and `j` > > * Furthermore, `i` is a variant column (also using the same top-level metadata) > * Furthermore, `j` is a partially shredded struct > * Fields `b`, `n`, and `y` live in the `value` column as a variant object I think the view approach could handle this example. For example * `v` is a `Variant::Object(..)` with `shredded_fields` * `v` 's fields in the `value_column` contain `b`, `n` and `y` * `v::shredded_fields` contains the `Variant`s for `a`, `m`, and `x` * `x` is a `VariantObject()` ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org