cseidman opened a new issue, #10013:
URL: https://github.com/apache/arrow-datafusion/issues/10013
### Describe the bug
I found a case where a query fails when there is both a HAVING clause and an
ORDER BY clause. The query will work if it has either the HAVING clause or the
ORDER BY clause .. but not both.
Here is a failing query:
```sql
with revs
as (
SELECT
customer,
cast(round(CAST(substring(revenue,2) AS float)) as INT) AS
visit_revenue
FROM visits
)
SELECT
customer,
SUM(visit_revenue) as total_revenue
FROM revs
group by customer
having SUM(visit_revenue) > 999
order by SUM(visit_revenue)
```
The error I get is:
```shell
Error: SchemaError(FieldNotFound { field: Column { relation: None, name:
"visit_revenue" }, valid_fields: [Column { relation: Some(Bare { table: "revs"
}), name: "customer" }, Column { relation: None, name: "total_revenue" },
Column { relation: Some(Bare { table: "revs" }), name: "customer" }, Column {
relation: None, name: "SUM(revs.visit_revenue)" }] }, Some(""))
```
If I comment out either the line with `having SUM(visit_revenue) > 999` OR
`order by SUM(visit_revenue)` the the query will succeed.
### To Reproduce
To reproduce, run this program with the attached CSV file as data.
```rust
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::prelude::{CsvReadOptions, SessionContext};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let file_name = "data/test.csv";
let schema = Schema::new(vec![
Field::new("date", DataType::Utf8, false),
Field::new("weekday", DataType::Utf8, false),
Field::new("time", DataType::Utf8, false),
Field::new("customer", DataType::Utf8, false),
Field::new("revenue", DataType::Utf8, false),
]);
let options = CsvReadOptions::new()
.schema(&schema)
.delimiter(b'\t')
.has_header(true)
;
let ctx = SessionContext::new();
ctx.register_csv("visits", file_name, options).await?;
let df = ctx.sql(r#"
with revs
as (
SELECT
customer,
cast(round(CAST(substring(revenue,2) AS float)) as
INT) AS visit_revenue
FROM visits
)
SELECT
customer,
SUM(visit_revenue) as total_revenue
FROM revs
group by customer
having SUM(visit_revenue) > 999
order by SUM(visit_revenue)
"#).await?;
df.show().await?;
Ok(())
}
```
The attached file contains the data:
[test.csv](https://github.com/apache/arrow-datafusion/files/14918379/test.csv)
### Expected behavior
The expectation is that the result set is both filtered for records where
the sum of `visit_revenue` > 999 AND ordered by the same without crashing.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]