cseidman opened a new issue, #10013:
URL: https://github.com/apache/arrow-datafusion/issues/10013

   ### Describe the bug
   
   I found a case where a query fails when there is both a HAVING clause and an 
ORDER BY clause. The query will work if it has either the HAVING clause or the 
ORDER BY clause .. but not both.
   
   Here is a failing query:
   ```sql
   with revs
   as (
       SELECT
           customer,
           cast(round(CAST(substring(revenue,2) AS float)) as INT) AS 
visit_revenue
       FROM visits
   )
   SELECT
           customer,
           SUM(visit_revenue) as total_revenue
   FROM revs
   group by customer
   having SUM(visit_revenue) > 999
   order by SUM(visit_revenue)
   ```
   The error I get is: 
   ```shell
   Error: SchemaError(FieldNotFound { field: Column { relation: None, name: 
"visit_revenue" }, valid_fields: [Column { relation: Some(Bare { table: "revs" 
}), name: "customer" }, Column { relation: None, name: "total_revenue" }, 
Column { relation: Some(Bare { table: "revs" }), name: "customer" }, Column { 
relation: None, name: "SUM(revs.visit_revenue)" }] }, Some(""))
   ```
   
   If I comment out either the line with `having SUM(visit_revenue) > 999` OR 
`order by SUM(visit_revenue)` the the query will succeed.
   
   
   ### To Reproduce
   
   To reproduce, run this program with the attached CSV file as data.
   
   ```rust
   use datafusion::arrow::datatypes::{DataType, Field, Schema};
   use datafusion::prelude::{CsvReadOptions, SessionContext};
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
       let file_name = "data/test.csv";
   
       let schema = Schema::new(vec![
           Field::new("date", DataType::Utf8, false),
           Field::new("weekday", DataType::Utf8, false),
           Field::new("time", DataType::Utf8, false),
           Field::new("customer", DataType::Utf8, false),
           Field::new("revenue",  DataType::Utf8, false),
       ]);
   
       let options = CsvReadOptions::new()
           .schema(&schema)
           .delimiter(b'\t')
           .has_header(true)
           ;
   
       let ctx = SessionContext::new();
       ctx.register_csv("visits", file_name, options).await?;
   
       let df = ctx.sql(r#"
                   with revs
                   as (
                       SELECT
                           customer,
                           cast(round(CAST(substring(revenue,2) AS float)) as 
INT) AS visit_revenue
                       FROM visits
                   )
                   SELECT
                           customer,
                           SUM(visit_revenue) as total_revenue
                   FROM revs
                   group by customer
                   having SUM(visit_revenue) > 999
                   order by SUM(visit_revenue)
                   "#).await?;
   
       df.show().await?;
       Ok(())
   }
   ```
   The attached file contains the data:
   
[test.csv](https://github.com/apache/arrow-datafusion/files/14918379/test.csv)
   
   
   ### Expected behavior
   
   The expectation is that the result set is both filtered for records where 
the sum of `visit_revenue` > 999 AND ordered by the same without crashing. 
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to