alamb commented on code in PR #8600:
URL: https://github.com/apache/arrow-rs/pull/8600#discussion_r2442296788


##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -763,6 +764,13 @@ mod test {
         BooleanArray::from(vec![Some(true), Some(false), Some(true)])
     );
 
+    perfectly_shredded_to_arrow_primitive_test!(
+        get_variant_perfectly_shredded_utf8_as_utf8,
+        DataType::Utf8,

Review Comment:
   I think it is perfectly reasonable to call `variant_get` and ask for the 
output to be `LargeUtf8` or `Utf8View`
   
   In terms of the Shredding Spec, 
https://github.com/apache/parquet-format/blob/master/VariantShredding.md is in 
terms of the Parquet type system which doesn't distinguish between string types 
like Utf8/LargeUtf8/Utf8View
   
   So my opinion is that we should (eventually) support those different string 
types, though it doesn't have to be in this PR
   
   Also, maybe it could be something simple such as `variant_get` internally 
knows how to extract strings as `Utf8` and then calls the `cast` kernel to cast 
to one of the other string types. We can build specialized codepaths for the 
other types if/when someone needs more performnace



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to