alamb commented on code in PR #8600:
URL: https://github.com/apache/arrow-rs/pull/8600#discussion_r2442296788
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -763,6 +764,13 @@ mod test {
BooleanArray::from(vec![Some(true), Some(false), Some(true)])
);
+ perfectly_shredded_to_arrow_primitive_test!(
+ get_variant_perfectly_shredded_utf8_as_utf8,
+ DataType::Utf8,
Review Comment:
I think it is perfectly reasonable to call `variant_get` and ask for the
output to be `LargeUtf8` or `Utf8View`
In terms of the Shredding Spec,
https://github.com/apache/parquet-format/blob/master/VariantShredding.md is in
terms of the Parquet type system which doesn't distinguish between string types
like Utf8/LargeUtf8/Utf8View
So my opinion is that we should (eventually) support those different string
types, though it doesn't have to be in this PR
Also, maybe it could be something simple such as `variant_get` internally
knows how to extract strings as `Utf8` and then calls the `cast` kernel to cast
to one of the other string types. We can build specialized codepaths for the
other types if/when someone needs more performnace
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]