appletreeisyellow opened a new pull request, #7262: URL: https://github.com/apache/arrow-datafusion/pull/7262
## Which issue does this PR close? Closes #5471. ## Rationale for this change Running `upper(col)` where `col` is a dictionary results in an internal error: ``` Internal error: The "upper" function can only accept strings.. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker ``` Other functions like `length` and `character_length` also have the same issue. Here is a list of all the functions with the same issue: ``` $ grep utf8_to_str_type datafusion/expr/src/built_in_function.rs 600: utf8_to_str_type(&input_expr_types[0], "btrim") 628: utf8_to_str_type(&input_expr_types[0], "initcap") 630: BuiltinScalarFunction::Left => utf8_to_str_type(&input_expr_types[0], "left"), 632: utf8_to_str_type(&input_expr_types[0], "lower") 634: BuiltinScalarFunction::Lpad => utf8_to_str_type(&input_expr_types[0], "lpad"), 636: utf8_to_str_type(&input_expr_types[0], "ltrim") 638: BuiltinScalarFunction::MD5 => utf8_to_str_type(&input_expr_types[0], "md5"), 651: utf8_to_str_type(&input_expr_types[0], "regex_replace") 654: utf8_to_str_type(&input_expr_types[0], "repeat") 657: utf8_to_str_type(&input_expr_types[0], "replace") 660: utf8_to_str_type(&input_expr_types[0], "reverse") 663: utf8_to_str_type(&input_expr_types[0], "right") 665: BuiltinScalarFunction::Rpad => utf8_to_str_type(&input_expr_types[0], "rpad"), 667: utf8_to_str_type(&input_expr_types[0], "rtrimp") 711: utf8_to_str_type(&input_expr_types[0], "split_part") 718: utf8_to_str_type(&input_expr_types[0], "substr") 740: utf8_to_str_type(&input_expr_types[0], "translate") 742: BuiltinScalarFunction::Trim => utf8_to_str_type(&input_expr_types[0], "trim"), 744: utf8_to_str_type(&input_expr_types[0], "upper") ``` ``` $ grep utf8_to_int_type datafusion/expr/src/built_in_function.rs 597: utf8_to_int_type(&input_expr_types[0], "bit_length") 603: utf8_to_int_type(&input_expr_types[0], "character_length") 645: utf8_to_int_type(&input_expr_types[0], "octet_length") 715: utf8_to_int_type(&input_expr_types[0], "strpos") ``` ## What changes are included in this PR? Support `Dictionary` data type for string functions and int functions ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, tests are added for all the functions listed above ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> User will be able to use `upper` function and other string and int functions where `col` is a dictionary without problem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
