neilconway commented on code in PR #20616:
URL: https://github.com/apache/datafusion/pull/20616#discussion_r2867533047
##########
datafusion/functions/src/string/common.rs:
##########
@@ -332,17 +332,34 @@ fn string_trim<T: OffsetSizeTrait, Tr: Trimmer>(args:
&[ArrayRef]) -> Result<Arr
}
pub(crate) fn to_lower(args: &[ColumnarValue], name: &str) ->
Result<ColumnarValue> {
- case_conversion(args, |string| string.to_lowercase(), name)
+ case_conversion(
+ args,
+ |string| string.to_lowercase(),
+ name,
+ Utf8ViewOutput::Utf8View,
+ )
}
pub(crate) fn to_upper(args: &[ColumnarValue], name: &str) ->
Result<ColumnarValue> {
- case_conversion(args, |string| string.to_uppercase(), name)
+ case_conversion(
+ args,
+ |string| string.to_uppercase(),
+ name,
+ Utf8ViewOutput::Utf8,
+ )
+}
+
+#[derive(Debug, Clone, Copy)]
+enum Utf8ViewOutput {
Review Comment:
Why do we need a new enum -- can't we just pass the return data type the
caller expects?
##########
datafusion/functions/src/string/common.rs:
##########
@@ -358,20 +375,38 @@ where
>(array, op)?)),
DataType::Utf8View => {
let string_array = as_string_view_array(array)?;
- let mut string_builder = StringBuilder::with_capacity(
- string_array.len(),
- string_array.get_array_memory_size(),
- );
-
- for str in string_array.iter() {
- if let Some(str) = str {
- string_builder.append_value(op(str));
- } else {
- string_builder.append_null();
+ match utf8view_output {
+ Utf8ViewOutput::Utf8 => {
Review Comment:
I'm confused why we need to add code to handle the "`Utf8View` input, `Utf8`
output" case -- seems like we don't actually want that behavior. If we fixed up
`upper` and `lower` as part of the same PR, couldn't we just have
`case_conversion` return a value of the same type as its input? That would
avoid this redundant code and also get rid of the `utf8view_output` parameter.
##########
datafusion/functions/src/string/lower.rs:
##########
@@ -82,7 +82,11 @@ impl ScalarUDFImpl for LowerFunc {
}
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
- utf8_to_str_type(&arg_types[0], "lower")
+ if arg_types[0] == DataType::Utf8View {
+ Ok(DataType::Utf8View)
+ } else {
+ utf8_to_str_type(&arg_types[0], "lower")
Review Comment:
I wonder if it would be helpful to have a helper here that handles all three
string representations and returns the optimal return type. Having a helper for
Utf8 vs. LargeUtf8 but handling Utf8View at the call-site seems a bit odd.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]