alamb commented on a change in pull request #9526: URL: https://github.com/apache/arrow/pull/9526#discussion_r579650998
########## File path: rust/arrow/src/compute/kernels/cast.rs ########## @@ -1297,6 +1301,57 @@ fn cast_list_inner<OffsetSize: OffsetSizeTrait>( Ok(Arc::new(list) as ArrayRef) } +/// Helper function to cast from Utf8 > LargeUtf8 and vice versa. If the LargeUtf8 it too large for +/// a Utf8 array it will return an Error. Review comment: ```suggestion /// Helper function to cast from `Utf8` to `LargeUtf8` and vice versa. If the `LargeUtf8` is too large for /// a `Utf8` array it will return an Error. ``` ########## File path: rust/arrow/src/compute/kernels/cast.rs ########## @@ -1778,6 +1833,26 @@ mod tests { assert_eq!(out, vec![Some("1"), Some("2"), Some("3")]); } + #[test] + fn test_str_to_str_casts() { Review comment: I recommend two other tests: 1. The other way as well -- `StringArray` --> `LargeStringArray` 2. Nulls (maybe ` let a = Arc::new(LargeStringArray::from(vec![Some("foo"), None, Some("ham")])) as ArrayRef;` for example) ########## File path: rust/arrow/src/compute/kernels/cast.rs ########## @@ -1297,6 +1301,57 @@ fn cast_list_inner<OffsetSize: OffsetSizeTrait>( Ok(Arc::new(list) as ArrayRef) } +/// Helper function to cast from Utf8 > LargeUtf8 and vice versa. If the LargeUtf8 it too large for +/// a Utf8 array it will return an Error. +fn cast_str_container<OffsetSizeFrom, OffsetSizeTo>(array: &dyn Array) -> Result<ArrayRef> +where + OffsetSizeFrom: StringOffsetSizeTrait + ToPrimitive, + OffsetSizeTo: StringOffsetSizeTrait + NumCast + ArrowNativeType, +{ + let str_array = array + .as_any() + .downcast_ref::<GenericStringArray<OffsetSizeFrom>>() + .unwrap(); + let list_data = array.data(); + let str_values_buf = str_array.value_data(); + + let offsets = unsafe { list_data.buffers()[0].typed_data::<OffsetSizeFrom>() }; + + let mut offset_builder = BufferBuilder::<OffsetSizeTo>::new(offsets.len()); + offsets.iter().try_for_each(|offset| { + let offset = match OffsetSizeTo::from(*offset) { + Some(idx) => idx, + None => { + return Err(ArrowError::ComputeError( + "large-utf8 array too large to cast to utf8-array".into(), + )) + } + }; + offset_builder.append(offset); + Ok(()) + })?; + + offset_builder.append(OffsetSizeTo::from_usize(str_values_buf.len()).unwrap()); Review comment: I am surprised you have to special case the last `str_values_buf.len()` (shouldn't that have already been in the source buffer's offsets already, if it was needed?) ########## File path: rust/arrow/src/compute/kernels/cast.rs ########## @@ -1297,6 +1301,57 @@ fn cast_list_inner<OffsetSize: OffsetSizeTrait>( Ok(Arc::new(list) as ArrayRef) } +/// Helper function to cast from Utf8 > LargeUtf8 and vice versa. If the LargeUtf8 it too large for +/// a Utf8 array it will return an Error. +fn cast_str_container<OffsetSizeFrom, OffsetSizeTo>(array: &dyn Array) -> Result<ArrayRef> +where + OffsetSizeFrom: StringOffsetSizeTrait + ToPrimitive, + OffsetSizeTo: StringOffsetSizeTrait + NumCast + ArrowNativeType, +{ + let str_array = array + .as_any() + .downcast_ref::<GenericStringArray<OffsetSizeFrom>>() + .unwrap(); + let list_data = array.data(); + let str_values_buf = str_array.value_data(); + + let offsets = unsafe { list_data.buffers()[0].typed_data::<OffsetSizeFrom>() }; + + let mut offset_builder = BufferBuilder::<OffsetSizeTo>::new(offsets.len()); + offsets.iter().try_for_each(|offset| { + let offset = match OffsetSizeTo::from(*offset) { + Some(idx) => idx, + None => { + return Err(ArrowError::ComputeError( + "large-utf8 array too large to cast to utf8-array".into(), + )) + } + }; Review comment: You might be able to use something like `ok_or_else` here if you wanted to avoid some repetition / be more idomatic: ```suggestion let offset = OffsetSizeTo::from(*offset).ok_or_else(|| { ArrowError::ComputeError( "large-utf8 array too large to cast to utf8-array".into(), ) })?; ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org