alamb commented on code in PR #12044:
URL: https://github.com/apache/datafusion/pull/12044#discussion_r1720954368


##########
datafusion/functions/src/unicode/substr.rs:
##########
@@ -168,10 +330,34 @@ where
     }
 }
 
+/// Extracts the substring of string starting at the start'th character, and 
extending for count characters if that is specified. (Same as substring(string 
from start for count).)
+/// substr('alphabet', 3) = 'phabet'
+/// substr('alphabet', 3, 2) = 'ph'
+/// The implementation uses UTF-8 code points as characters
+fn calculate_substr<'a, V, T>(string_array: V, args: &[ArrayRef]) -> 
Result<ArrayRef>

Review Comment:
   I think this function adds an unecessary level of indirection 
   
   Rather than having`substr` dispatch to `calculate_substr` which then 
dispatches to `calculate_string` or `calculate_string_view` I suspect the code 
would be easier to follow



##########
datafusion/sqllogictest/test_files/arrow_typeof.slt:
##########
@@ -424,7 +424,7 @@ select arrow_cast([1, 2, 3], 'FixedSizeList(3, Int64)');
 [1, 2, 3]
 
 # Tests for Utf8View
-query ?T
+query TT

Review Comment:
   I believe it is correct -- I actually have a PR to do the same change in 
https://github.com/apache/datafusion/pull/12033 but it hasn't merged yet. It is 
good to do it in this PR instead. Thank you



##########
datafusion/functions/src/unicode/substr.rs:
##########
@@ -77,7 +78,11 @@ impl ScalarUDFImpl for SubstrFunc {
     }
 
     fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        utf8_to_str_type(&arg_types[0], "substr")
+        if arg_types[0] == DataType::Utf8View {
+            Ok(DataType::Utf8View)
+        } else {
+            utf8_to_str_type(&arg_types[0], "substr")
+        }

Review Comment:
   Maybe we can make a second function like `optmized_utf8_to_str_type` -- with 
comments that make it clear that the difference between that and 
`utf8_to_str_type` if if the string function has a specialized implementation 
for string view. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to