scovich commented on issue #9113:
URL: https://github.com/apache/arrow-rs/issues/9113#issuecomment-3731096862
> My use case is converting a row-wise format into Arrow which will then be
manipulated some and eventually written as Parquet. One of the fields is as
`prost_wkt_types::Struct` which is effectively the gRPC equivalent of JSON and
is well suited to being converted and carried as a Variant. Since this crate
only supports JSON, I needed to do that conversion myself. While I haven't
benchmarked this, data-locality suggests I'm likely best off converting an
entire row at a time into multiple Arrow arrays which put me roughly into the
following code structure:
>
> // Create builders for each field
> let mut ids = StringBuilder::new();
> let mut properties = VariantArrayBuilder::new();
>
> // loop over rows and add the fields
> for row in rows {
> ids.append_value(row.id);
> ...
> }
>
> // Finish all the builders, eventually returning a RecordBatch
> ...
That does sound right. You might check out the basic JSON to variant
converter:
https://github.com/apache/arrow-rs/blob/main/parquet-variant-json/src/from_json.rs#L105-L130
Or the arrow-compute JSON to variant converter which uses it (via macro,
sorry it's harder to read):
https://github.com/apache/arrow-rs/blob/main/parquet-variant-compute/src/from_json.rs#L27-L48
Notice how the basic converter takes `impl VariantBuilderExt`, and then the
compute kernel leverages that to pass `&mut VariantArrayBuilder` via the
JsonToVariant trait:
https://github.com/apache/arrow-rs/blob/main/parquet-variant-json/src/from_json.rs#L66-L79
I realize, looking now, that it's all rather indirect. But in theory, you
could so something that mimics the `append_json` function from my first link
above:
```rust
fn append_prost_struct(s: &prost:wkt_types::Struct, &mut impl
VariantBuilderExt) -> Result<(), ...> {
/* loop over the fields, recursing on nested structs as needed */
}
```
and then just:
```rust
for row in rows {
ids.append_value(row.id);
append_prost_struct(&row.properties, &mut properties);
}
// Finish all the builders
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]