[jira] [Commented] (CALCITE-1670) Count distinct on druid is translated to Cardinality aggregator which is approximate

Julian Hyde (JIRA) Fri, 03 Mar 2017 17:10:55 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895355#comment-15895355
 ]


Julian Hyde commented on CALCITE-1670:
--------------------------------------

In CALCITE-1587 we added a property, approximateDistinctCount. The idea was to 
push distinct-count down to Druid's Cardinality if approximateDistinctCount is 
true.

I would also like to be able to declare that a particular aggregate call is 
approximate; in CALCITE-1588 [~gian] remarked that Druid SQL has an operator 
called {{APPROX_COUNT_DISTINCT}}.

I wasn't aware that there was a way to accomplish distinct-count in Druid. We 
have a rewrite rule in Calcite that can do it. It generates two levels of 
Aggregate. I believe (please correct me if I'm wrong) that Druid can only do 
one Aggregate pass. If so, maybe we could enable that rule and we could push 
one of the levels of Aggregate down to Druid.

> Count distinct on druid is translated to Cardinality aggregator which is 
> approximate
> ------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1670
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1670
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Nishant Bangarwa
>            Assignee: Julian Hyde
>
> Right now count distinct on Druid is translated as a 'cardinality' aggregator 
> which uses hyperloglog and return approximate results. See cardinality 
> aggregator here - http://druid.io/docs/latest/querying/aggregations.html for 
> details. 
> https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
> {code} 
> case COUNT:
>       if (aggCall.isDistinct()) {
>         return new JsonCardinalityAggregation("cardinality", name, list);
>       }
>       return new JsonAggregation("count", name, only);
> {code} 
> The current recommended way in druid to get exact counts is to do a nested 
> groupby query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1670) Count distinct on druid is translated to Cardinality aggregator which is approximate

Reply via email to