scovich opened a new pull request, #8324:
URL: https://github.com/apache/arrow-rs/pull/8324

   # Which issue does this PR close?
   
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax.
   
   - Closes #NNN.
   
   # Rationale for this change
   
   The `ParentState` class, combined with `VariantBuilderExt` trait, makes it 
pretty easy to work with variant builders. But it only works for "well-known" 
builder types -- which does not and cannot include the `VariantArrayBuilder` 
because it lives in a different crate. 
   
   This becomes a problem for e.g. 
https://github.com/apache/arrow-rs/issues/8323, because it's currently 
impossible to append multiple values to a `VariantArrayBuilder` -- it needs to 
create and `finish` one`VariantArrayVariantBuilder` adapter for each appended 
value.
   
   Plus, we will eventually need a `VariantValueArrayBuilder` that works with 
read-only metadata, for shredding, unshredding, and projecting variant values. 
Which will undoubtedly encounter the same sorts of problems, since shredding 
and unshredding code relies heavily on `VariantBuilderExt`.
   
   # What changes are included in this PR?
   
   Make `ParentState` customizable, which allows `VariantArrayBuilder` to 
directly implement `VariantBuilderExt`. This simplifies both the array 
builder's implementation and the code that uses it, and also opens the way for 
other custom builders like the `VariantValueArrayBuilder` we will eventually 
need.
   
   NOTE: One (serious) downside of this approach is the use of a boxed trait 
instance. This effectively requires a heap allocation (and virtual method 
dispatch) for every single value appended to a variant array, which I really 
don't love.
   
   If we don't like the overhead of the boxed trait approach, alternatives I've 
considered include:
   * Add new parent state enum variants for each new type of 
`VariantBuilderExt`, even those that come from other crates. 
      * PRO: The least amount of code of any alternative I've considered
      * PRO: Zero additional overhead compared to "native" types
      * CON: Architectural violation to make parquet-variant crate (at least 
somewhat) aware of parquet-variant-compute crate that depends on it.
   * Make the various builder classes generic, and change `ParentState` to a 
(not dyn-compat) trait that becomes a type constraint for those classes.
      * NOTE: `VariantBuilderExt` is already not dyn-compat
      * PRO: Even _less_ overhead than what we have today, because we no longer 
need enum variant dispatch all over the place
      * CON: A lot of code churn to make all the necessary classes generic. Tho 
it's unclear how much that will actually impact users of the API. Messy library 
code isn't necessarily bad, as long as it has a clean user surface.
      * CON: Each impl would have to define _all_ trait fields and nearly all 
methods -- even the ones that are the same for every impl -- to avoid running 
afoul of the rust borrow checker. The same borrow checker issues are why every 
`ParentState` enum variant defines the same half dozen fields over and over, in 
addition to any variant-specific extra state.
   
   # Are these changes tested?
   
   Yes, many unit tests were updated to use the new approach instead of the old 
(removed) approach.
   
   # Are there any user-facing changes?
   
   No, because variant support is still experimental, but:
   * `ParentState` has a new enum variant that references a new public 
`CustomParentState` trait, with corresponding new `ParentState::custom` 
constructor for it.
   * `VariantArrayBuilder` now implements `VariantBuilderExt` directly, and the 
old `VariantArrayVariantBuilder` adapter class has been removed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to