viirya commented on code in PR #3607:
URL: https://github.com/apache/arrow-rs/pull/3607#discussion_r1088579361
##########
arrow-cast/src/cast.rs:
##########
@@ -3458,6 +3414,44 @@ fn cast_list_inner<OffsetSize: OffsetSizeTrait>(
Ok(Arc::new(list) as ArrayRef)
}
+/// Helper function to cast from `GenericBinaryArray` to `GenericStringArray`.
This function performs
+/// UTF8 validation during casting. For invalid UTF8 value, it could be Null
or returning `Err` depending
+/// `CastOptions`.
+fn cast_binary_to_generic_string<I, O>(
+ array: &dyn Array,
+ cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError>
+where
+ I: OffsetSizeTrait,
+ O: OffsetSizeTrait,
+{
+ let array = array
+ .as_any()
+ .downcast_ref::<GenericByteArray<GenericBinaryType<I>>>()
+ .unwrap();
+ Ok(Arc::new(
+ array
+ .iter()
+ .map(|maybe_value| match maybe_value {
+ Some(value) => {
+ let result = std::str::from_utf8(value);
Review Comment:
> This will be significantly faster than the approach here as it doesn't
copy any string data and performs UTF-8 validation in a single pass (not to
mention less code)
This can only be applied if `CastOptions.safe` as false. If
`CastOptions.safe` as true, we still need to iterate and validate each value
(because it will be null instead returning `Err` directly).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]