scovich commented on code in PR #9791:
URL: https://github.com/apache/arrow-rs/pull/9791#discussion_r3148318454
##########
parquet-variant-compute/src/unshred_variant.rs:
##########
@@ -175,12 +174,17 @@ impl<'a> UnshredVariantRowBuilder<'a> {
}
}
- /// Creates a new UnshredVariantRowBuilder from shredding state
- /// Returns None for None/None case - caller decides how to handle based
on context
- fn try_new_opt(shredding_state: BorrowedShreddingState<'a>) ->
Result<Option<Self>> {
- let value = shredding_state.value_field();
- let typed_value = shredding_state.typed_value_field();
- let Some(typed_value) = typed_value else {
+ /// Creates a new UnshredVariantRowBuilder from the `(value, typed_value)`
pair of a shredded
+ /// variant struct. Returns None for the None/None case - caller decides
how to handle based on
+ /// context.
+ fn try_new_opt(inner_struct: &'a StructArray) -> Result<Option<Self>> {
Review Comment:
Hmm. I wonder if we're going about this wrong.
First question: What problem are we actually trying to solve by eliminating
`BorrowedShreddingState`? Is it just annoying to have two similar types? Or
something else more serious?
Second question: What if (thought experiment) we standardized on
`BorrowedShreddingState` everywhere instead?
* Only use `ShreddingState` as an internal helper member of `VariantArray`,
`ShreddedVariantFieldArray`, etc? (its job is to centralize the name-based
lookup and validation code; we should probably push `inner` inside as well,
since that's always there)
* `VariantArray::shredding_state()` then returns
`self.shredding_state.borrow()` (`BorrowedShreddingState<'_>` return type)
* All functions that currently expect `ShreddingState` change to expect
`BorrowedShreddingState` instead (I think this is already the case)
Third question: Where are we actually cloning StructArray, PrimitiveArray,
etc today? Does the proposed change improve the situation, make it worse, or
leave it unchanged? For example, VariantArray and ShreddedVariantFieldArray
constructors both clone their input struct array today, and I don't think the
current PR changes that.
Fourth question: Now that we know `VariantArray` is only a temporary helper
that cannot actually `impl Array`, should we revisit the decision to make it an
owned type? If VariantArray maintained references internally instead of owned
values, then we could just use borrowed types everywhere and be done with it.
Would the benefits be worth the headaches it causes code that uses VariantArray?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]