cyrilou242 opened a new issue, #11504: URL: https://github.com/apache/pinot/issues/11504
## Current As of today, `COUNT DISTINCT` is only supported for a single column. For instance, `COUNT(DISTINCT col1)`. ## Ask Add support for multiple columns: ``` COUNT(DISTINCT(col_1, ..., col_n)) ``` Examples: 1) Simple: ``` SELECT COUNT(DISTINCT(col1, col2, col3)) FROM table ``` 2) With a group by: ``` SELECT col_x, COUNT(DISTINCT(col1, col2, col3)) FROM table GROUP BY col_x ``` ## Current workaround *not tested, please correct if need be* 1) Simple: ``` with t as (SELECT DISTINCT col1, col2, col3) SELECT COUNT(*) FROM t ``` 2) With a group by: **NOT SURE - is it possible in an efficient manner?** ``` with t as (SELECT DISTINCT col_x, col1, col2, col3) SELECT col_x COUNT(col1) FROM t GROUP BY col_x ``` here doing the distinct on col_x could be very unnefficient. ## Why This will make it easy to count duplicates along some columns: `SELECT COUNT(*) - COUNT(DISTINCT(col1, col2, col3))` This will makes it easy to do count distinct along multiple columns in complex queries. When using the workaround of the sub query, some operations (eg filters) have to be moved in the inner query, while some have to be kept in the outer query. This is hard to write and can be unnefficient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
