scovich commented on code in PR #8521:
URL: https://github.com/apache/arrow-rs/pull/8521#discussion_r2395974814
##########
parquet-variant-compute/src/arrow_to_variant.rs:
##########
@@ -38,6 +38,98 @@ use parquet_variant::{
use std::collections::HashMap;
use std::ops::Range;
+// ============================================================================
+// Shared traits and helpers for Arrow-to-Variant conversion
+// ============================================================================
+
+/// Zero-cost trait for converting Arrow array values to Variant
+pub(crate) trait ArrowToVariant: Array {
+ fn append_to_variant_builder(
+ &self,
+ builder: &mut impl VariantBuilderExt,
+ index: usize,
+ ) -> Result<(), ArrowError>;
+}
+
+/// Macro to define ArrowToVariant implementations with optional value
transformation
+macro_rules! define_arrow_to_variant {
Review Comment:
> After I spent some time reviewing this PR I think I would like to propose
some naming consolidation as we have two functions that are inverses of each
other that are confusingly named;
>
> * `cast_to_variant` which casts arrow arrays to variant
>
> * `variant_to_arrow` which casts variants to arrow arrays
>
>
> I will make a PR to propose `cast_arrow_to_variant` and
`cast_variant_to_arrow` for the kernel names and we can keep the modules
`variant_to_arrow` and `arrow_to_variant` with the actual code
`cast_to_variant` is a function that leverages the `arrow_to_variant` module.
`variant_to_arrow` is a module that the `variant_get` function relies on.
I guess `variant_get` could be seen as a superset of a hypothetical
`cast_variant_to_arrow` function that is an inverse of
`cast_[arrow_]to_variant`?
That said, I do think it's a good idea to step back and take a hard look at
naming conventions as the code matures and the interactions (or lack thereof)
become clearer:
* `cast_to_variant` - converts fully strongly-typed data to binary variant
* uses the row builders defined in `arrow_to_variant` module
* `shred_variant` - shreds a binary variant input according to the requested
shredding schema
* uses its own row builders defined in the same module
* `unshred_variant` - converts shredded variant back to binary variant
* uses its own row builders defined in the same module
* `variant_get` - can be used to extract fully strongly-typed data from
variant (shredded or not)
* tries to do columnar operations when possible, but falls back to row
builders defined in the `variant_to_arrow` module when necessary
And, for good measure, some of the conversions probably should move to
`type_conversions` module, and I don't know where the `ListLikeArray` trait
should live?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]