alamb commented on code in PR #7365:
URL: https://github.com/apache/arrow-datafusion/pull/7365#discussion_r1303360870


##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -690,6 +690,21 @@ pub fn create_physical_fun(
                     let func = specializer_func(args)?;
                     func(args)
                 }
+                DataType::Binary => {

Review Comment:
   I think we can make the coercion more general and not have to add something 
specific in the physical implementation
   
   In general the coercion from one type to another is handled in 
https://github.com/apache/arrow-datafusion/blob/b6fe8159fd21ea101b84abc2629dbb484c8ad326/datafusion/expr/src/type_coercion/functions.rs#L192
   
   Is there any chance you can try adding a coercion there for binary -> utf8 
which would allow DataFusion to automatically try and cast binary --> UTF8 if 
users try and call a function or operator that requires a Utf8?



##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1086,6 +1086,13 @@ impl BuiltinScalarFunction {
             BuiltinScalarFunction::RegexpReplace => Signature::one_of(
                 vec![
                     Exact(vec![Utf8, Utf8, Utf8]),
+                    Exact(vec![Binary, Utf8, Utf8]),

Review Comment:
   I think it would be better to try and cast the binary --> utf8 at plan time 
rather than support a binary argument and then cast at plan time as it is more 
general and will apply to other functions
   
   I left another comment below to this effect



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -1966,7 +1966,7 @@ select
 from t_source;
 
 # No groupy
-query error DataFusion error: Internal error: Min/Max accumulator not 
implemented for type Binary\. This was likely caused by a bug in DataFusion's 
code and we would welcome that you file an bug report in our issue tracker
+statement ok

Review Comment:
   I think this line should be changed to `query` rather than `statement` -- 
otherwise this test is simply ensuring the query doesn't error but not checking 
the results



##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1392,8 +1399,8 @@ macro_rules! make_utf8_to_return_type {
     ($FUNC:ident, $largeUtf8Type:expr, $utf8Type:expr) => {
         fn $FUNC(arg_type: &DataType, name: &str) -> Result<DataType> {
             Ok(match arg_type {
-                DataType::LargeUtf8 => $largeUtf8Type,
-                DataType::Utf8 => $utf8Type,
+                DataType::LargeUtf8 | DataType::LargeBinary=> $largeUtf8Type,

Review Comment:
   I am not sure about this -- this implies that any function that takes a 
binary input will produce binary output which doesn't seem right



##########
datafusion/physical-expr/src/aggregate/min_max.rs:
##########
@@ -1228,6 +1266,66 @@ mod tests {
         )
     }
 
+    #[test]
+    fn min_binary() -> Result<()> {

Review Comment:
   I actually think the test in aggregate.slt will be sufficient -- we don't 
need these extra unit tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to