jorgecarleitao commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r584924393



##########
File path: rust/datafusion/src/physical_plan/string_expressions.rs
##########
@@ -361,10 +534,167 @@ pub fn ltrim<T: StringOffsetSizeTrait>(args: 
&[ArrayRef]) -> Result<ArrayRef> {
     }
 }
 
-/// Converts the string to all lower case.
-/// lower('TOM') = 'tom'
-pub fn lower(args: &[ColumnarValue]) -> Result<ColumnarValue> {
-    handle(args, |x| x.to_ascii_lowercase(), "lower")
+/// Returns last n characters in the string, or when n is negative, returns 
all but first |n| characters.
+/// right('abcde', 2) = 'de'
+pub fn right<T: StringOffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
+    let string_array: &GenericStringArray<T> = args[0]
+        .as_any()
+        .downcast_ref::<GenericStringArray<T>>()
+        .ok_or_else(|| {
+            DataFusionError::Internal("could not cast string to 
StringArray".to_string())
+        })?;
+
+    let n_array: &Int64Array =
+        args[1]
+            .as_any()
+            .downcast_ref::<Int64Array>()
+            .ok_or_else(|| {
+                DataFusionError::Internal("could not cast n to 
Int64Array".to_string())
+            })?;
+
+    let result = string_array
+        .iter()
+        .enumerate()
+        .map(|(i, x)| {
+            if n_array.is_null(i) {
+                None
+            } else {
+                x.map(|x: &str| {
+                    let n: i64 = n_array.value(i);

Review comment:
       First of all, thanks a lot for this, @seddonm1 . Impressive.
   
   I do not have time to go through all of this, but I will try to comment over 
what I find.
   
   Because we do not check that the length of the two arrays are equal, I think 
that this will read out of bounds whenever `n_array.len() < string_array.len()`.
   
   I would recommend tthat we do not use `.value` or `is_null` and instead only 
use the corresponding iterators and `zip`. The reason being that even if the 
iterators have a different length, the resulting code is still sound as the 
zipped iterator stops at the shortest iterator.
   
   More generally, I think we should set a specific guideline and not allow new 
code to use these `unsafe`-yet-safe APIs, or better, just mark them as 
deprecated so that people use the corresponding safe ones.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to