cyrilou242 opened a new issue, #11504:
URL: https://github.com/apache/pinot/issues/11504

   ## Current
   As of today, 
   `COUNT DISTINCT` is only supported for a single column.  
   For instance, `COUNT(DISTINCT col1)`.
   
   ## Ask
   Add support for multiple columns: 
   ```
   COUNT(DISTINCT(col_1, ..., col_n))
   ```
   
   Examples:
   1) Simple:
   ```
   SELECT 
   COUNT(DISTINCT(col1, col2, col3))
   FROM table
   ```
   
   2) With a group by:
   ```
   SELECT 
   col_x, 
   COUNT(DISTINCT(col1, col2, col3))
   FROM table
   GROUP BY col_x
   ```
   
   ## Current workaround
   *not tested, please correct if need be*
   
   1) Simple:
   ```
   with t as (SELECT DISTINCT col1, col2, col3)
   SELECT 
   COUNT(*)
   FROM t
   ```
   
   2) With a group by:
   **NOT SURE - is it possible in an efficient manner?**
   ```
   with t as (SELECT DISTINCT col_x, col1, col2, col3)
   SELECT 
   col_x
   COUNT(col1)
   FROM t
   GROUP BY col_x
   ```
   here doing the distinct on col_x could be very unnefficient.
   
   ## Why
   This will make it easy to count duplicates along some columns: `SELECT 
COUNT(*) - COUNT(DISTINCT(col1, col2, col3))`
   
   This will makes it easy to do count distinct along multiple columns in 
complex queries. 
   When using the workaround of the sub query, some operations (eg filters) 
have to be moved in the inner query, while some have to be kept in the outer 
query. This is hard to write and can be unnefficient.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to