alamb opened a new issue, #3685:
URL: https://github.com/apache/arrow-datafusion/issues/3685
**Describe the bug**
A clear and concise description of what the bug is.
Coercion does not work for a column with a `DataType::Dictionary(Int16,
Utf8)` type -- aka a dictionary encoded string column when comparing to an
integer.
So a predicate such as
```sql
where vendor_id = 12345
```
will work fine if `vendor_id` is `Utf8` but will not work fine if
`vendor_id` is a dictionary.
**To Reproduce**
```rust
use std::sync::Arc;
use datafusion::{
arrow::{
array::{
DictionaryArray, StringArray
},
datatypes::{Int16Type},
record_batch::RecordBatch,
},
prelude::SessionContext,
};
// Create a custom aggregate function that looks like an influxql "first"
selector
// to demonstrate how to return struct values
#[tokio::main]
async fn main() {
let vendor_id_utf8: StringArray = vec![Some("124"), Some("345")]
.into_iter()
.collect();
let vendor_id_dict: DictionaryArray<Int16Type> = vec![Some("124"),
Some("345")]
.into_iter()
.collect();
let batch = RecordBatch::try_from_iter(vec![
("vendor_id_utf8", Arc::new(vendor_id_utf8) as _),
("vendor_id_dict", Arc::new(vendor_id_dict) as _),
])
.unwrap();
// register as a table
let ctx = SessionContext::new();
ctx.register_batch("t", batch).unwrap();
// Query with a predicate against a string and a int64 works fine
:thumbsup:
// +----------------+----------------+
// | vendor_id_utf8 | vendor_id_dict |
// +----------------+----------------+
// | 124 | 124 |
// +----------------+----------------
ctx.sql("SELECT * from t where vendor_id_utf8 = 124")
.await
.unwrap()
.show()
.await
.unwrap();
// However, when the predicate (on the same values) encded as a
dictionary we get an error:
// Internal("The type of Dictionary(Int16, Utf8) = Int64 of binary
physical should be same"
ctx.sql("SELECT * from t where vendor_id_dict = 124")
.await
.unwrap()
.show()
.await
.unwrap();
// You can work around the issue by using an explicit cast
// +----------------+----------------+
// | vendor_id_utf8 | vendor_id_dict |
// +----------------+----------------+
// | 124 | 124 |
// +----------------+----------------
ctx.sql("SELECT * from t where cast(vendor_id_dict as bigint) = 124")
.await
.unwrap()
.show()
.await
.unwrap();
```
**Expected behavior**
1. This query should execute and give the same answer
2. If type coercion is not possible, it should throw an error during
planning (not the internal error) that the types could not be coerced.
**Additional context**
Add any other context about the problem here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]