scovich commented on issue #8336:
URL: https://github.com/apache/arrow-rs/issues/8336#issuecomment-3321623733

   While doing a bit of pathfinding for this issue, I realized that both struct 
(this issue) and list (https://github.com/apache/arrow-rs/issues/8337) support 
will hit a fundamental limitation/flaw in `VariantArray::value` and the 
`Variant` class it returns:
   * `Variant` is a _borrowed_ type, when it comes to complex variant values 
(objects and arrays)
   * But unshredding an object or array requires building a new variant value, 
whose bytes are now owned by the function that created them.
   * We currently have no way to return owned variant value bytes
   * Primitive variant values only worked "by accident" because they parse the 
input bytes to values instead of retaining a reference to those bytes. 
   
   Meanwhile, `VariantArray::value` has a huge number of call sites, so any 
change we make here will seriously disrupt the user surface. Things I tried so 
far, none of which seemed satisfactory:
   1. `VariantArray::value` returns a new enum `VariantArrayValue`, with an 
`Borrowed` variant (a wrapper around`Variant`) and an `Owned` variant (which 
contains a metadata byte slice and a value byte vec). 
      * PRO: Expedient. A relatively local change that makes clear what's going 
on.
      * CON: It's very annoying API surface, because the same borrowing rules 
that prevent `VariantArray::value` from returning a `Variant` directly also 
prevent any member method of `VariantArrayValue` from doing so (unless we start 
delving into unsafe cell magic).
   2. `VariantObject` and `VariantList` take `Cow<'v, [u8]>` instead of `&v 
[u8]`.
      * PRO: It should hide COW issues from the vast majority of users of 
`Variant` API. 
      * CON: Several internal methods rely on slice manipulation that is not 
friendly to an owned byte slice
      * CON: Bloats those classes from 64B to 72B
   3. Some kind of split API in `VariantArray` that replaces the `value` 
method, e.g. `primitive_value` or `complex_value`
      * PRO: Only call sites that expect complex owned values would have to 
deal with them, and in theory no such call sites exist yet because this is 
currently an unsupported operation.
      * CON: `s/value/primitive_value/g` will still be a hugely disruptive 
operation, and it's also not obvious that "primitive" vs. "complex" is really 
the right place to split it. The real issue is _owned_ vs. _borrowed_, and 
there's no obvious way for a reader to efficiently determine whether a given 
row will produce an owned or borrowed value, without doing a lot of up-front 
work.
   
   CC @alamb @codephage2020 @klion26 @liamzwbao @friendlymatthew who have 
worked on various bits of this code over the past few months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to