[ 
https://issues.apache.org/jira/browse/CALCITE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895365#comment-15895365
 ] 

Gian Merlino commented on CALCITE-1670:
---------------------------------------

Druid can do two aggregate passes using nested groupBys. I recently set up 
Druid to use that Calcite rule you're talking about, and it works well. The 
patch for that was: https://github.com/druid-io/druid/pull/3999. With that 
applied, APPROX_COUNT_DISTINCT is always approximate, and COUNT(DISTINCT col) 
is approximate by default but can be made exact through config. Not all queries 
work in exact mode, for example you can't have two distinct counts on two 
different columns (since it would generate a plan that Druid's runtime doesn't 
support).

> Count distinct on druid is translated to Cardinality aggregator which is 
> approximate
> ------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1670
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1670
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Nishant Bangarwa
>            Assignee: Julian Hyde
>
> Right now count distinct on Druid is translated as a 'cardinality' aggregator 
> which uses hyperloglog and return approximate results. See cardinality 
> aggregator here - http://druid.io/docs/latest/querying/aggregations.html for 
> details. 
> https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
> {code} 
> case COUNT:
>       if (aggCall.isDistinct()) {
>         return new JsonCardinalityAggregation("cardinality", name, list);
>       }
>       return new JsonAggregation("count", name, only);
> {code} 
> The current recommended way in druid to get exact counts is to do a nested 
> groupby query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to