NGA-TRAN opened a new issue, #9586: URL: https://github.com/apache/arrow-datafusion/issues/9586
### Describe the bug Wrong result when there are 2 count distinct in the select clause, ### To Reproduce I will share there file with @alamb because I cannot attached .parquet file here ### Bug ```SQL -- BUG SELECT COUNT(DISTINCT host) AS servers_count, count(distinct pool) as pool_count, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY server_role, os, env, datacenter; +---------------+------------+-------------+---------+------------+------------+ | servers_count | pool_count | server_role | os | env | datacenter | +---------------+------------+-------------+---------+------------+------------+ | 1 | 1 | mesg | windows | production | mn | | 2 | 2 | mesg | windows | production | va | -- should have 4 servers_count and 3 pool_count +---------------+------------+-------------+---------+------------+------------+ ``` ### Here are other queries running in DF CLI that tell me the right results ```SQL SELECT DISTINCT host, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' order by datacenter; +---------------+-------------+---------+------------+------------+ | host | server_role | os | env | datacenter | +---------------+-------------+---------+------------+------------+ | mpm-mesg1002c | mesg | windows | production | mn | | vpm-mesg1001c | mesg | windows | production | va | | vpm-mesg1005b | mesg | windows | production | va | | vpm-mesg1008d | mesg | windows | production | va | | vpm-mesg1007b | mesg | windows | production | va | +---------------+-------------+---------+------------+------------+ SELECT DISTINCT pool, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' order by datacenter; +------+-------------+---------+------------+------------+ | pool | server_role | os | env | datacenter | +------+-------------+---------+------------+------------+ | c | mesg | windows | production | mn | | b | mesg | windows | production | va | | c | mesg | windows | production | va | | d | mesg | windows | production | va | +------+-------------+---------+------------+------------+ SELECT COUNT(DISTINCT host) AS servers_count, "server_role", "os", "env", "datacenter" from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY "server_role", "os", "env", "datacenter"; +---------------+-------------+---------+------------+------------+ | servers_count | server_role | os | env | datacenter | +---------------+-------------+---------+------------+------------+ | 4 | mesg | windows | production | va | | 1 | mesg | windows | production | mn | +---------------+-------------+---------+------------+------------+ SELECT count(distinct pool) as pool_count, "server_role", "os", "env", "datacenter" from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY "server_role", "os", "env", "datacenter"; +------------+-------------+---------+------------+------------+ | pool_count | server_role | os | env | datacenter | +------------+-------------+---------+------------+------------+ | 1 | mesg | windows | production | mn | | 3 | mesg | windows | production | va | +------------+-------------+---------+------------+------------+ ``` ### Expected behavior ```SQL SELECT COUNT(DISTINCT host) AS servers_count, count(distinct pool) as pool_count, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY server_role, os, env, datacenter; +---------------+------------+-------------+---------+------------+------------+ | servers_count | pool_count | server_role | os | env | datacenter | +---------------+------------+-------------+---------+------------+------------+ | 1 | 1 | mesg | windows | production | mn | | 4 | 3 | mesg | windows | production | va | +---------------+------------+-------------+---------+------------+------------+ ``` ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
