jorgecarleitao commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r584924393
########## File path: rust/datafusion/src/physical_plan/string_expressions.rs ########## @@ -361,10 +534,167 @@ pub fn ltrim<T: StringOffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { } } -/// Converts the string to all lower case. -/// lower('TOM') = 'tom' -pub fn lower(args: &[ColumnarValue]) -> Result<ColumnarValue> { - handle(args, |x| x.to_ascii_lowercase(), "lower") +/// Returns last n characters in the string, or when n is negative, returns all but first |n| characters. +/// right('abcde', 2) = 'de' +pub fn right<T: StringOffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { + let string_array: &GenericStringArray<T> = args[0] + .as_any() + .downcast_ref::<GenericStringArray<T>>() + .ok_or_else(|| { + DataFusionError::Internal("could not cast string to StringArray".to_string()) + })?; + + let n_array: &Int64Array = + args[1] + .as_any() + .downcast_ref::<Int64Array>() + .ok_or_else(|| { + DataFusionError::Internal("could not cast n to Int64Array".to_string()) + })?; + + let result = string_array + .iter() + .enumerate() + .map(|(i, x)| { + if n_array.is_null(i) { + None + } else { + x.map(|x: &str| { + let n: i64 = n_array.value(i); Review comment: First of all, thanks a lot for this, @seddonm1 . Impressive. I do not have time to go through all of this, but I will try to comment over what I find. Because we do not check that the length of the two arrays are equal, I think that this will read out of bounds whenever `n_array.len() < string_array.len()`. I would recommend tthat we do not use `.value` or `is_null` and instead only use the corresponding iterators and `zip`. The reason being that even if the iterators have a different length, the resulting code is still sound as the zipped iterator stops at the shortest iterator. More generally, I think we should set a specific guideline and not allow new code to use these `unsafe`-yet-safe APIs, or better, just mark them as deprecated so that people use the corresponding safe ones. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org