scovich opened a new pull request, #8167: URL: https://github.com/apache/arrow-rs/pull/8167
# Which issue does this PR close? Pathfinding. Even if we like what we see here, it needs to split into a half dozen other PR in order to actually merge. But first we need to see the big picture to know whether it's worth going further. Several flavors of pathfinding rolled up in one, but with the overarching goal of: - https://github.com/apache/arrow-rs/issues/7715#issuecomment-3062004492 - https://github.com/apache/arrow-rs/issues/8152 An alternate approach to - https://github.com/apache/arrow-rs/pull/7915/ # Rationale for this change While trying to rebase https://github.com/apache/arrow-rs/pull/7915/, I kept running into more and more ugliness. And it was already kind of ugly. Also, I realized that the `ValueBuffer` use of builders with a generic `ParentState::Variant` was incorrect in some rollback scenarios where the attempt to insert a value fails (`try_insert_object` or `try_insert_list`). Further, the interaction between `VariantBuilder` and `VariantArrayBuilder` was quite convoluted. It turns out all of those issues are a bit related, and this PR is the result of my exploration. # What changes are included in this PR? A lot! In no particular order: * Rename `ValueBuffer` as `ValueBuilder` and make it public * Heavily rework the design of `ParentState` to be much clearer and hopefully easier to use * Track `finished` status on behalf of the builders that use it (their drop glue goes away) * Each variant tracks the state that needs to be rolled back in case `finish` is not called. In particular, the value and metadata buffer offsets are tracked and rolled back uniformly. * New constructors eagerly update the offset/field info for list/object, and track the info necessary to roll back that change if the builder fails to `finish` after all. * The `ObjectBuilder` methods become fallible, because they immediately detect and report duplicated field names when that checking is enabled. * `VariantBuilder::finish` becomes infallible (because any failed field insertion is caught immediately by the now-fallible `ObjectBuilder` methods). However, the method still returns `Result::Ok` for now, to avoid excessive unit test churn (removing dozens of `unwrap` and `?` calls scattered all over the place) * Rework `ValueBuilder::[try_]append_variant` to be an associated function instead of a method, taking a `ParentState` from its caller in order to ensure correct rollback semantics. * Make `ParentState` public * The `MetadataBuilder` struct becomes reusable, similar to normal arrow array builders -- its `finish` method takes`&mut self`. * `VariantArrayBuilder` now directly instantiates a `ValueBuilder` and `MetadataBuilder` pair up front, and converts them to byte slices only when its own `finish` method is called. It no longer attempts to create a `VariantBuilder` at any point, and works directly with the `ValueBuilder` API instead. No more `mem::take` magic needed -- the `VariantArrayVariantBuilder` helper class just takes a mut ref to its owner and its finish method handles the rest; the drop glue becomes empty (just to help catch forgotten `finish` calls) because the `ParentState` now deals with all needed cleanup. * `VariantArrayBuilder` also tracks starting offsets instead of `(offset, len)` pairs -- the latter is easily derived from the former. * Turn `MetadataBuilder` into a trait (the original struct is renamed as `MetadataBuilderXX` for now, final naming still TBD). Most methods that previously took a `MetadataBuilder` struct now take a `&mut dyn MetadataBuilder` instead. * Deleted a bunch of code originally added to support `VariantArrayBuilder`, now that it's no longer needed. * Added fallible and infallible pairs for `new_object` and `new_list` to `VariantBuilderExt` trait. That way, the existing unit test code that calls those methods can keep using the infallible versions (panicking if there's a duplicate field name). Prod code that actually cares calls the new infallible versions instead. * Introduced a basic `ReadOnlyMetadataBuilder` that implements the `MetadataBuilder` trait but "mounts" an existing `VariantMetadata`. It takes advantage of the fact that object field insertions are now fallible, in order to return an error if the requested field name does not exist in the underlying metadata dictionary. # Are these changes tested? Partly. I need to port over the unit tests from https://github.com/apache/arrow-rs/pull/7915/ to fully exercise the new `ReadOnlyVariantBuilder`. # Are there any user-facing changes? Yes, lots of stuff became public. Also, the semantics of some functions changed (breaking change) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
