jonmmease opened a new issue, #5034:
URL: https://github.com/apache/arrow-datafusion/issues/5034
### **Describe the bug**
I'm seeing an error during physical planning for the following query
```sql
SELECT *
FROM "tbl"
INNER JOIN (SELECT "colB", count(DISTINCT "colA") as "colC" FROM "tbl" GROUP
BY "colB") AS "q1"
USING("colB")
```
```
Plan("The left or right side of the join does not have all columns on
\"on\": \nMissing on the left: {}\nMissing on the right: {Column { name:
\"colB\", index: 0 }}")
```
where `tbl` is a table with columns: `colA` and `colB` (both of type
`UInt64`).
Interestingly, planning and query evaluation work properly when the
`DISTINCT` qualifier is removed from the `count` aggregation.
### **Context**
The purpose of this query is to add a new column (`colC`) to the input table
that contains the number of unique values of `colA` that correspond to each
value of `colB`. This is a simplified reproduction of an issue that we're
seeing in VegaFusion's implementation of the Vega
[pivot](https://vega.github.io/vega/docs/transforms/pivot/) transform.
### **To Reproduce**
Here is a Rust test that reproduces the error:
```rust
#[cfg(test)]
mod test_join_plan_bug2 {
use std::sync::Arc;
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datafusion::datasource::empty::EmptyTable;
use datafusion::prelude::SessionContext;
#[tokio::test]
async fn count_distinct_error() {
let schema = Arc::new(Schema::new(vec![
Field::new("colA", DataType::UInt64, true),
Field::new("colB", DataType::UInt64, true),
])) as SchemaRef;
let empty_table = EmptyTable::new(schema);
// Create context and register table
let ctx = SessionContext::new();
ctx.register_table("tbl", Arc::new(empty_table)).unwrap();
let sql1 = r#"
SELECT *
FROM "tbl"
INNER JOIN (SELECT "colB", count(DISTINCT "colA") as "colC" FROM "tbl" GROUP
BY "colB") AS "q1"
USING("colB")
"#;
let logical_plan =
ctx.state().create_logical_plan(sql1).await.unwrap();
let physical_plan =
ctx.state().create_physical_plan(&logical_plan).await.unwrap();
println!("{:#?}", physical_plan);
}
}
```
```
called `Result::unwrap()` on an `Err` value: Plan("The left or right side of
the join does not have all columns on \"on\": \nMissing on the left:
{}\nMissing on the right: {Column { name: \"colB\", index: 0 }}")
thread 'test_join_plan_bug2::count_distinct_error' panicked at 'called
`Result::unwrap()` on an `Err` value: Plan("The left or right side of the join
does not have all columns on \"on\": \nMissing on the left: {}\nMissing on the
right: {Column { name: \"colB\", index: 0 }}")', src/lib.rs:806:83
stack backtrace:
```
### **Expected behavior**
Physical planning should complete without error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]