scovich commented on issue #7715: URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-2989079256
> I'm not sure what kernel support for variants should look like, but my ideal would be to find a way to minimise the need for explicit support in kernels outside of those specifically for variants. This issue is specifically about the low-level (byte slice) support for manipulating individual variant values -- a building block for future higher level columnar operations -- and that logic should almost certainly live in the parquet-variant crate? Meanwhile -- One way or another, people will need a way to manipulate columns of variant values in and out of more strongly-typed forms. Otherwise I'm not sure how useful variant would be in practice? Where that higher level logic lives seems like a secondary question. Probably some arrow-variant crate that depends on arrow-schema, arrow-array, and parquet-variant? > I had envisaged that variants would need to be materialized/coerced to a concrete arrow array before being processed by arrow kernels such as take, cast, eq, etc... Not sure how the type system should handle it -- I'm not really familiar arrow's logical type system -- but both binary and shredded variant columns physically take the form of structs. So I'd expect the parquet reader to produce a `StructArray` with the appropriate columns (`metadata`, `value`, and/or `typed_value`). The problem is what to do with it at that point. One could imagine defining kernels like `variant_extract` (that fetch a sub-variant by path, possibly with some form of casting), a variant "reader builder" (similar to JSON builder) that can convert variant values to strongly-typed (**), a "writer builder" that can convert strongly-typed values to variant, etc. All of those would likely need the kinds of byte-level manipulation this issue describes. (**) Problem is, the JSON reader builder has a lot of semantics challenges that people have kind of been expecting variant to solve... so we have some work to do there if we don't want to just lift and shift the problem from json to variant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
