alamb opened a new issue, #4854:
URL: https://github.com/apache/arrow-datafusion/issues/4854

   **Describe the bug**
   A clear and concise description of what the bug is.
   
   **To Reproduce**
   using datafusion-cli load some data:
   
   ```sql
   ❯ create or replace table t as select column1 as value, column2 as time from 
(select * from (values
     (1, timestamp '2022-01-01 00:00:00'),
     (2, timestamp '2022-01-01 01:00:00'),
     (3, timestamp '2022-01-02 00:00:00')
   ) as sq) as sq
   ;
   
   0 rows in set. Query took 0.001 seconds.
   ❯ select *  from t;
   +-------+---------------------+
   | value | time                |
   +-------+---------------------+
   | 1     | 2022-01-01T00:00:00 |
   | 2     | 2022-01-01T01:00:00 |
   | 3     | 2022-01-02T00:00:00 |
   +-------+---------------------+
   3 rows in set. Query took 0.001 seconds.
   ❯ ;
   ```
   
   `GROUP BY "time"` works:
   
   ```sql
   ❯ select
     sum(value) AS "value",
     date_trunc('hour',time) AS "time"
   FROM t
   GROUP BY time;
   +-------+---------------------+
   | value | time                |
   +-------+---------------------+
   | 3     | 2022-01-02T00:00:00 |
   | 2     | 2022-01-01T01:00:00 |
   | 1     | 2022-01-01T00:00:00 |
   +-------+---------------------+
   3 rows in set. Query took 0.002 seconds.
   ```
   
   However, `GROUP BY "time" ORDER BY "time"` does not work:
   
   ```sql
   ❯ select
     sum(value) AS "value",
     date_trunc('month',time) AS "time"
   FROM t
   GROUP BY time
   ORDER BY time;
   
   SchemaError(AmbiguousReference { qualifier: Some("t"), name: "time" })
   ```
   
   **Expected behavior**
   I expect the query to run and produce ordered results like postgres:
   
   ```sql
   postgres=# select
     sum(value) AS "value",
     date_trunc('month',time) AS "time"
   FROM t
   GROUP BY time
   ORDER BY time;
    value |        time         
   -------+---------------------
        3 | 2022-01-01 00:00:00
        2 | 2022-01-01 00:00:00
        1 | 2022-01-01 00:00:00
   (3 rows)
   ```
   
   **Additional context**
   
   Note I was confused about why this query wasn't actually aggregating on the 
truncated date (as in why are there 3 output rows rather than 1 output row). 
The reason is that the `GROUP BY time` is (correctly) evaluated before the 
`date_trunc` function evaluated in the select list.
   
   To group by the truncated date, you  need to group by 
`date_trunc('month',time) AS "time"` , which you can do using `2` 
   
   ```sql
   select
     sum(value) AS "value",
     date_trunc('month',time) AS "time"
   FROM t
   GROUP BY 2
   ORDER BY 2;
   +-------+---------------------+
   | value | time                |
   +-------+---------------------+
   | 6     | 2022-01-01T00:00:00 |
   +-------+---------------------+
   1 row in set. Query took 0.005 seconds.
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to