scovich opened a new pull request, #8167:
URL: https://github.com/apache/arrow-rs/pull/8167

   # Which issue does this PR close?
   
   Pathfinding. Even if we like what we see here, it needs to split into a half 
dozen other PR in order to actually merge. But first we need to see the big 
picture to know whether it's worth going further.
   
   Several flavors of pathfinding rolled up in one, but with the overarching 
goal of:
   - https://github.com/apache/arrow-rs/issues/7715#issuecomment-3062004492
   - https://github.com/apache/arrow-rs/issues/8152
   
   An alternate approach to
   - https://github.com/apache/arrow-rs/pull/7915/
   
   # Rationale for this change
   
   While trying to rebase https://github.com/apache/arrow-rs/pull/7915/, I kept 
running into more and more ugliness. And it was already kind of ugly. 
   
   Also, I realized that the `ValueBuffer` use of builders with a generic 
`ParentState::Variant` was incorrect in some rollback scenarios where the 
attempt to insert a value fails (`try_insert_object` or `try_insert_list`).
   
   Further, the interaction between `VariantBuilder` and `VariantArrayBuilder` 
was quite convoluted.
   
   It turns out all of those issues are a bit related, and this PR is the 
result of my exploration.
   
   # What changes are included in this PR?
   
   A lot! In no particular order:
   * Rename `ValueBuffer` as `ValueBuilder` and make it public
   * Heavily rework the design of `ParentState` to be much clearer and 
hopefully easier to use
      * Track `finished` status on behalf of the builders that use it (their 
drop glue goes away)
      * Each variant tracks the state that needs to be rolled back in case 
`finish` is not called. In particular, the value and metadata buffer offsets 
are tracked and rolled back uniformly.
      * New constructors eagerly update the offset/field info for list/object, 
and track the info necessary to roll back that change if the builder fails to 
`finish` after all.
   * The `ObjectBuilder` methods become fallible, because they immediately 
detect and report duplicated field names when that checking is enabled. 
   * `VariantBuilder::finish` becomes infallible (because any failed field 
insertion is caught immediately by the now-fallible `ObjectBuilder` methods). 
However, the method still returns `Result::Ok` for now, to avoid excessive unit 
test churn (removing dozens of `unwrap` and `?` calls scattered all over the 
place)
   * Rework `ValueBuilder::[try_]append_variant` to be an associated function 
instead of a method, taking a `ParentState` from its caller in order to ensure 
correct rollback semantics.
   * Make `ParentState` public
   * The `MetadataBuilder` struct becomes reusable, similar to normal arrow 
array builders -- its `finish` method takes`&mut self`.
   * `VariantArrayBuilder` now directly instantiates a `ValueBuilder` and 
`MetadataBuilder` pair up front, and converts them to byte slices only when its 
own `finish` method is called. It no longer attempts to create a 
`VariantBuilder` at any point, and works directly with the `ValueBuilder` API 
instead. No more `mem::take` magic needed -- the `VariantArrayVariantBuilder` 
helper class just takes a mut ref to its owner and its finish method handles 
the rest; the drop glue becomes empty (just to help catch forgotten `finish` 
calls) because the `ParentState` now deals with all needed cleanup.
   * `VariantArrayBuilder` also tracks starting offsets instead of `(offset, 
len)` pairs -- the latter is easily derived from the former.
   * Turn `MetadataBuilder` into a trait (the original struct is renamed as 
`MetadataBuilderXX` for now, final naming still TBD). Most methods that 
previously took a `MetadataBuilder` struct now take a `&mut dyn 
MetadataBuilder` instead.
   * Deleted a bunch of code originally added to support `VariantArrayBuilder`, 
now that it's no longer needed.
   * Added fallible and infallible pairs for `new_object` and `new_list` to 
`VariantBuilderExt` trait. That way, the existing unit test code that calls 
those methods can keep using the infallible versions (panicking if there's a 
duplicate field name). Prod code that actually cares calls the new infallible 
versions instead.
   * Introduced a basic `ReadOnlyMetadataBuilder` that implements the 
`MetadataBuilder` trait but "mounts" an existing `VariantMetadata`. It takes 
advantage of the fact that object field insertions are now fallible, in order 
to return an error if the requested field name does not exist in the underlying 
metadata dictionary.
   
   # Are these changes tested?
   
   Partly. I need to port over the unit tests from 
https://github.com/apache/arrow-rs/pull/7915/ to fully exercise the new 
`ReadOnlyVariantBuilder`.
   
   # Are there any user-facing changes?
   
   Yes, lots of stuff became public.
   
   Also, the semantics of some functions changed (breaking change)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to