alamb commented on code in PR #12044:
URL: https://github.com/apache/datafusion/pull/12044#discussion_r1720954368
##########
datafusion/functions/src/unicode/substr.rs:
##########
@@ -168,10 +330,34 @@ where
}
}
+/// Extracts the substring of string starting at the start'th character, and
extending for count characters if that is specified. (Same as substring(string
from start for count).)
+/// substr('alphabet', 3) = 'phabet'
+/// substr('alphabet', 3, 2) = 'ph'
+/// The implementation uses UTF-8 code points as characters
+fn calculate_substr<'a, V, T>(string_array: V, args: &[ArrayRef]) ->
Result<ArrayRef>
Review Comment:
I think this function adds an unecessary level of indirection
Rather than having`substr` dispatch to `calculate_substr` which then
dispatches to `calculate_string` or `calculate_string_view` I suspect the code
would be easier to follow
##########
datafusion/sqllogictest/test_files/arrow_typeof.slt:
##########
@@ -424,7 +424,7 @@ select arrow_cast([1, 2, 3], 'FixedSizeList(3, Int64)');
[1, 2, 3]
# Tests for Utf8View
-query ?T
+query TT
Review Comment:
I believe it is correct -- I actually have a PR to do the same change in
https://github.com/apache/datafusion/pull/12033 but it hasn't merged yet. It is
good to do it in this PR instead. Thank you
##########
datafusion/functions/src/unicode/substr.rs:
##########
@@ -77,7 +78,11 @@ impl ScalarUDFImpl for SubstrFunc {
}
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
- utf8_to_str_type(&arg_types[0], "substr")
+ if arg_types[0] == DataType::Utf8View {
+ Ok(DataType::Utf8View)
+ } else {
+ utf8_to_str_type(&arg_types[0], "substr")
+ }
Review Comment:
Maybe we can make a second function like `optmized_utf8_to_str_type` -- with
comments that make it clear that the difference between that and
`utf8_to_str_type` if if the string function has a specialized implementation
for string view.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]