scovich commented on code in PR #8768:
URL: https://github.com/apache/arrow-rs/pull/8768#discussion_r2495168669


##########
parquet-variant-compute/src/variant_to_arrow.rs:
##########
@@ -60,6 +61,9 @@ pub(crate) enum PrimitiveVariantToArrowRowBuilder<'a> {
     TimestampNanoNtz(VariantToTimestampNtzArrowRowBuilder<'a, 
datatypes::TimestampNanosecondType>),
     Time(VariantToPrimitiveArrowRowBuilder<'a, 
datatypes::Time64MicrosecondType>),
     Date(VariantToPrimitiveArrowRowBuilder<'a, datatypes::Date32Type>),
+    Utf8(VariantToUtf8ArrowRowBuilder<'a, i32>),
+    LargeUtf8(VariantToUtf8ArrowRowBuilder<'a, i64>),
+    Binary(VariantToBinaryArrowRowBuilder<'a>),

Review Comment:
   > I think we already have a binary builder 
[here](https://github.com/apache/arrow-rs/blob/a30890e6c4d7379c1de2d376488b1e724deefb3a/parquet-variant-compute/src/variant_to_arrow.rs#L588-L628),
 how about reusing it? Perhaps just change this 
[line](https://github.com/apache/arrow-rs/blob/a30890e6c4d7379c1de2d376488b1e724deefb3a/parquet-variant-compute/src/variant_to_arrow.rs#L332)
 to make it work
   
   The `VariantToBinaryVariantArrowRowBuilder` converts shredded variant to 
unshredded (binary) variant.
   
   Here, we need a builder that can extract `Variant::Binary` values into a 
`BinaryArray`, with nulls (or errors) wherever the variant was not (convertible 
to) binary. 
   
   I believe this PR correctly does the latter, but only for 
`DataType::Binary`. It leaves out `DataType::BinaryView`, 
`DataType::LargeBinary` and `DataType::FixedSizeBinary`. 
   
   The first three can all be covered by a single generic builder, by defining 
a new `BinaryLikeArrayBuilder` trait. It would be very similar to the 
`StringLikeArrayBuilder` trait that #8600 already added for the same reason.
   
   It's debatable whether we actually need fixed-size binary support. If we did 
support it, it would need a custom builder whose extraction would presumably 
fail for any `Variant::Binary` with the wrong length?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to