jonmmease opened a new issue, #4864:
URL: https://github.com/apache/arrow-datafusion/issues/4864
**Describe the bug**
I'm seeing some unintuitive behavior around type coercion for UDFs that
input integers.
1. float values are passed through to the UDF without error and without
coercion to integer type.
2. When the argument is an expression that adds a float and int, the
physical planner raises an arrow that "The type of Float64 + Int64 of binary
physical should be same"
**To Reproduce**
Here is a self-contained example that reproduces these two issues
```rust
#[cfg(test)]
mod tests_types {
use std::sync::Arc;
use datafusion::arrow::array::{ArrayRef, Float64Array, StringArray,
TimestampMillisecondArray};
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef,
TimeUnit};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::arrow::util::pretty::pretty_format_batches;
use datafusion::datasource::MemTable;
use datafusion::logical_expr::{ColumnarValue, ReturnTypeFunction,
ScalarFunctionImplementation, ScalarUDF, Signature, Volatility};
use datafusion::prelude::SessionContext;
#[tokio::test]
async fn test() {
// Create context and register table
let ctx = SessionContext::new();
// Register custom UDF
ctx.register_udf(make_int_udf());
// Perform query 1.
// The UDF is called with Float64 arguments rather than raise an
error
// or coerce float to integer
let res = ctx.sql(r#"
SELECT int_udf(1.0)
"#).await.unwrap().collect().await.unwrap();
let formatted = pretty_format_batches(res.as_slice()).unwrap();
println!("{}", formatted);
// Perform query 2
// An error is raised: "The type of Float64 + Int64 of binary
physical should be same"
let res = ctx.sql(r#"
SELECT int_udf(1.0 + 0)
"#).await.unwrap().collect().await.unwrap();
}
pub fn make_int_udf() -> ScalarUDF {
let datetime_components: ScalarFunctionImplementation =
Arc::new(move |args: &[ColumnarValue]| {
return Ok(args[0].clone())
});
let return_type: ReturnTypeFunction =
Arc::new(move |_| Ok(Arc::new(DataType::Int64)));
let signature = Signature::exact(
vec![
DataType::Int64, // month
],
Volatility::Immutable,
);
ScalarUDF::new(
"int_udf",
&signature,
&return_type,
&datetime_components,
)
}
}
```
Output
```
+---------------------+
| int_udf(Float64(1)) |
+---------------------+
| 1 |
+---------------------+
called `Result::unwrap()` on an `Err` value: Internal("The type of Float64 +
Int64 of binary physical should be same")
thread 'tests_types::test' panicked at 'called `Result::unwrap()` on an
`Err` value: Internal("The type of Float64 + Int64 of binary physical should be
same")', src/lib.rs:270:40
stack backtrace:
```
**Expected behavior**
For query 1, I don't know if the intension is to coerce the float to an int,
but I would expect either:
- An error stating that a Float64 cannot be Coerced to an Int64
- The UDF to be called with an Int64 array
For query 2, I would expect either of the outcomes above, but not an error
in the physical planner.
**Additional context**
A similar error message was reported by @andygrove in [The physical planner
error message](https://github.com/apache/arrow-datafusion/issues/4763)
Maybe related to https://github.com/apache/arrow-datafusion/issues/4615?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]