NGA-TRAN commented on issue #9586: URL: https://github.com/apache/arrow-datafusion/issues/9586#issuecomment-1997711716
Many thanks to @erratic-pattern who has identified the PR that introduce: Issue is related to [this PR](https://github.com/apache/arrow-datafusion/pull/9234) for array aggregate order and distinct. the commit immediately before it works correctly with the above queries. in arrow-datafusion: ``` cd datafusion-cli git checkout 0e728fce0a1a87567979bc74ebb64951b0fd9ac8 cargo build ./target/debug/datafusion-cli -f ../bug.sql DataFusion CLI v36.0.0 +---------------+------------+------------+ | servers_count | pool_count | datacenter | +---------------+------------+------------+ | 1 | 1 | mn | | 4 | 3 | va | +---------------+------------+------------+ ``` if you then try to run the same query after the above PR, you get the incorrect result: ``` git checkout fc84a639fca7716e529384c0e919fb90b75139da cargo build ./target/debug/datafusion-cli -f ../bug.sql DataFusion CLI v36.0.0 +---------------+------------+------------+ | servers_count | pool_count | datacenter | +---------------+------------+------------+ | 1 | 1 | mn | | 3 | 2 | va | +---------------+------------+------------+ 2 rows in set. Query took 0.534 seconds. ``` bug.sql: ```sql SELECT COUNT(DISTINCT host) AS servers_count, count(distinct pool) as pool_count, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY datacenter; ``` We are working to share the file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
