scovich commented on issue #8336: URL: https://github.com/apache/arrow-rs/issues/8336#issuecomment-3321623733
While doing a bit of pathfinding for this issue, I realized that both struct (this issue) and list (https://github.com/apache/arrow-rs/issues/8337) support will hit a fundamental limitation/flaw in `VariantArray::value` and the `Variant` class it returns: * `Variant` is a _borrowed_ type, when it comes to complex variant values (objects and arrays) * But unshredding an object or array requires building a new variant value, whose bytes are now owned by the function that created them. * We currently have no way to return owned variant value bytes * Primitive variant values only worked "by accident" because they parse the input bytes to values instead of retaining a reference to those bytes. Meanwhile, `VariantArray::value` has a huge number of call sites, so any change we make here will seriously disrupt the user surface. Things I tried so far, none of which seemed satisfactory: 1. `VariantArray::value` returns a new enum `VariantArrayValue`, with an `Borrowed` variant (a wrapper around`Variant`) and an `Owned` variant (which contains a metadata byte slice and a value byte vec). * PRO: Expedient. A relatively local change that makes clear what's going on. * CON: It's very annoying API surface, because the same borrowing rules that prevent `VariantArray::value` from returning a `Variant` directly also prevent any member method of `VariantArrayValue` from doing so (unless we start delving into unsafe cell magic). 2. `VariantObject` and `VariantList` take `Cow<'v, [u8]>` instead of `&v [u8]`. * PRO: It should hide COW issues from the vast majority of users of `Variant` API. * CON: Several internal methods rely on slice manipulation that is not friendly to an owned byte slice * CON: Bloats those classes from 64B to 72B 3. Some kind of split API in `VariantArray` that replaces the `value` method, e.g. `primitive_value` or `complex_value` * PRO: Only call sites that expect complex owned values would have to deal with them, and in theory no such call sites exist yet because this is currently an unsupported operation. * CON: `s/value/primitive_value/g` will still be a hugely disruptive operation, and it's also not obvious that "primitive" vs. "complex" is really the right place to split it. The real issue is _owned_ vs. _borrowed_, and there's no obvious way for a reader to efficiently determine whether a given row will produce an owned or borrowed value, without doing a lot of up-front work. CC @alamb @codephage2020 @klion26 @liamzwbao @friendlymatthew who have worked on various bits of this code over the past few months. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org