BlakeOrth commented on issue #9113:
URL: https://github.com/apache/arrow-rs/issues/9113#issuecomment-3730655794
> I recently reviewed a PR that used a VariantBuilder and then manually
built a Variant which was manually inserted to a VariantArrayBuilder... when
VariantArrayBuilder implements VariantBuilderExt and could have been used
directly.
Hm, I'm wondering if the author of that PR had a similar journey with the
Variant as me! I'm actually making this exact same "mistake" independently.
I'll detail my journey a bit to add some color here and why I opened this PR.
My use case is converting a row-wise format into Arrow which will then be
manipulated some and eventually written as Parquet. One of the fields is as
`prost_wkt_types::Struct` which is effectively the gRPC equivalent of JSON and
is well suited to being converted and carried as a Variant. Since this crate
only supports JSON, I needed to do that conversion myself. While I haven't
benchmarked this, data-locality suggests I'm likely best off converting an
entire row at a time into multiple Arrow arrays which put me roughly into the
following code structure:
```rust
// Create builders for each field
let mut ids = StringBuilder::new();
let mut properties = VariantArrayBuilder::new();
// loop over rows and add the fields
for row in rows {
ids.append_value(row.id);
...
}
// Finish all the builders, eventually returning a RecordBatch
...
```
Given how all the various Arrow Array builders work, the natural solution
for me to look for seemed like:
"I need to build a variant and add it to my array, just like I do with the
other arrays"
With that thought I went to the docs looking how to construct a `Variant`,
pretty easily ran across the `VariantBuilder` and proceeded constructing
Variants using the builder and adding them either to the array, or to a higher
level `Variant` in the case of nested (list/struct) types. That initial journey
landed me at returning `(Vec<u8>, Vec<u8>)` from several methods when
converting my nested items, which prompted me to want a more expressive type
for that return value.
All that being said, I did end up wising up and at least looking at the
`parquet-variant-json` implementation because this felt pretty clunky and I
thought there's probably a better way. Indeed, I hadn't thought of making a
single `VariantBuilder` and passing `&mut builder` to construct nested lists
and objects. All of the examples I ran across in the docs were relatively
simple and built flat, well-defined objects so nothing really tipped me into
thinking of this as an option. Given that
> seeing a VariantBuilder in arrow code is a likely anti-pattern
This seems to indicate I have more work to do in order to make this better!
However, the fact that both I and another user both independently reached for
the `VariantBuilder` likely suggests it's a natural solution rather than
realizing you can use `VariantArrayBuilder` directly.
My "other solutions" in the initial issue suggested doing nothing, but
perhaps that could be amended to "documentation improvements" to help guide
users towards the preferred solution when a more complex usage of the
VariantArray is needed with manual conversion of nested types etc.?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]