alamb commented on issue #7715: URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-3058435051
> We should start figuring out what that low-level support looks like. A likely starting point would be the ability to insert and remove specific variant values from an existing variant object. These should be cheap byte-shuffling operations that don't waste time introspecting unrelated parts of the variant value buffer. And it needs to be efficient even when doing recursive inserts and removes as part of a partial (un)shredding operation. @friendlymatthew and I spoke a bit about this API today. Here is what I think I heard Add new kernels (in the `parquet-variant-compute` crate that @harshmotw-db is making in https://github.com/apache/arrow-rs/pull/7884), something like ## Field Access The first kernel we need is something to extract a field from a variant Here is a databricks function that does this: https://docs.databricks.com/gcp/en/sql/language-manual/functions/variant_get ```rust /// Given a StructArray with a Variant value stored as `metadata`, `value`, and optionally typed_value fields /// returns the specified field /// The returned array might be another Variant StructArray or a Primitive or StringArray /// if the requested field was shredded pub fn variant_get(variant_array: StructArray, path: VariantPath) -> Result>ArrayRef { .. } ``` Open questions: 1. What should the "path" argument be? A String? A JSON path? Some structured thing (Vec<PathSegment>)`? 2. Should we also provide a "requested data type" field? Similar to the data bricks function ```rust /// Given a StructArray with a Variant value stored as `metadata`, `value`, and optionally typed_value fields /// returns the specified field CAST TO `as_type`, TYPE, IF SPECIFIED /// /// if `as_type` is None, the returned array might be another Variant StructArray or a Primitive or StringArray /// if the requested field was shredded /// /// if `as_type` is Some(type) the field is returned as the specified type. To specify returning /// a Variant, pass a Field with variant type in the metadata pub fn variant_get(variant_array: StructArray, path: VariantPath, as_type: Option<&Field>) -> Result<ArrayRef> { .. } ``` ## Shredding Kernel ```rust /// Given a StructArray with a Variant value stored in the `metadata` and value fields, /// returns a new StructArray with metadata, value, and typed_value fields /// that have the specified columns "shredded" into strongly typed columns pub fn shred_variant(variant_array: StructArray, spec: ShreddingSpecification) -> StructArray { .. } ``` Open questions: 1. What does `ShreddingSpecification` look like (we could look at the API in iceberg-java) to figure this out 2. What should happen if the input `variant_array` already has some shredded columns 3. Do we need an `unshred_variant` kernel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org