alamb commented on issue #6897:
URL: 
https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629071485

   I have verified this has been fixed on master (aka what will be released in 
DataFusion `28.0.0`).
   
   BTW I  added new test coverage in 
https://github.com/apache/arrow-datafusion/pull/6836 so that we don't break 
this again by accident. 
   
   Since it is a regression I would be willing to create a patch release 
(`27.0.1`) with the fix if that would be helpful for others 
   
   Using this query (thanks for the reproducer @maxburke 🙏 )
   
   ```sql
   SELECT 
    "day"  AS  "date", count(distinct "direction")  AS  "num_directions" 
   FROM 'test_data.parquet' 
   GROUP BY "day" 
   ORDER BY "day" ASC;
   ```
   
   ## `26.0.0` works
   ```shell
   DataFusion CLI v26.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  
"num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   +---------------------+----------------+
   | date                | num_directions |
   +---------------------+----------------+
   | 2011-09-09T00:00:00 | 2              |
   | 2011-09-10T00:00:00 | 2              |
   ...
   
   | 2018-04-14T00:00:00 | 2              |
   | 2018-04-15T00:00:00 | 2              |
   +---------------------+----------------+
   81 rows in set. Query took 0.024 seconds.
   ❯
   ```
   
   ## `27.0.0` fails
   ```shell
   DataFusion CLI v27.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  
"num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   Optimizer rule 'simplify_expressions' failed
   caused by
   Schema error: No field named "test_data.parquet".day. Valid fields are 
"test_data.parquet.day", "COUNT(DISTINCT test_data.parquet.direction)".
   ❯
   ```
   
   ## `main` passes:
   ```shell
   $ git checkout main
   Already on 'main'
   Your branch is up to date with 'apache/main'.
   $ CARGO_TARGET_DIR=/Users/alamb/Software/target-df cargo run
       Finished dev [unoptimized + debuginfo] target(s) in 0.27s
        Running `/Users/alamb/Software/target-df/debug/datafusion-cli`
   DataFusion CLI v27.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  
"num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   +---------------------+----------------+
   | date                | num_directions |
   +---------------------+----------------+
   | 2011-09-09T00:00:00 | 2              |
   | 2011-09-10T00:00:00 | 2              |
   ...
   | 2018-04-14T00:00:00 | 2              |
   | 2018-04-15T00:00:00 | 2              |
   +---------------------+----------------+
   81 rows in set. Query took 0.027 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to