alamb commented on code in PR #7365:
URL: https://github.com/apache/arrow-datafusion/pull/7365#discussion_r1303360870
##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -690,6 +690,21 @@ pub fn create_physical_fun(
let func = specializer_func(args)?;
func(args)
}
+ DataType::Binary => {
Review Comment:
I think we can make the coercion more general and not have to add something
specific in the physical implementation
In general the coercion from one type to another is handled in
https://github.com/apache/arrow-datafusion/blob/b6fe8159fd21ea101b84abc2629dbb484c8ad326/datafusion/expr/src/type_coercion/functions.rs#L192
Is there any chance you can try adding a coercion there for binary -> utf8
which would allow DataFusion to automatically try and cast binary --> UTF8 if
users try and call a function or operator that requires a Utf8?
##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1086,6 +1086,13 @@ impl BuiltinScalarFunction {
BuiltinScalarFunction::RegexpReplace => Signature::one_of(
vec![
Exact(vec![Utf8, Utf8, Utf8]),
+ Exact(vec![Binary, Utf8, Utf8]),
Review Comment:
I think it would be better to try and cast the binary --> utf8 at plan time
rather than support a binary argument and then cast at plan time as it is more
general and will apply to other functions
I left another comment below to this effect
##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -1966,7 +1966,7 @@ select
from t_source;
# No groupy
-query error DataFusion error: Internal error: Min/Max accumulator not
implemented for type Binary\. This was likely caused by a bug in DataFusion's
code and we would welcome that you file an bug report in our issue tracker
+statement ok
Review Comment:
I think this line should be changed to `query` rather than `statement` --
otherwise this test is simply ensuring the query doesn't error but not checking
the results
##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1392,8 +1399,8 @@ macro_rules! make_utf8_to_return_type {
($FUNC:ident, $largeUtf8Type:expr, $utf8Type:expr) => {
fn $FUNC(arg_type: &DataType, name: &str) -> Result<DataType> {
Ok(match arg_type {
- DataType::LargeUtf8 => $largeUtf8Type,
- DataType::Utf8 => $utf8Type,
+ DataType::LargeUtf8 | DataType::LargeBinary=> $largeUtf8Type,
Review Comment:
I am not sure about this -- this implies that any function that takes a
binary input will produce binary output which doesn't seem right
##########
datafusion/physical-expr/src/aggregate/min_max.rs:
##########
@@ -1228,6 +1266,66 @@ mod tests {
)
}
+ #[test]
+ fn min_binary() -> Result<()> {
Review Comment:
I actually think the test in aggregate.slt will be sufficient -- we don't
need these extra unit tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]