[
https://issues.apache.org/jira/browse/CALCITE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895355#comment-15895355
]
Julian Hyde commented on CALCITE-1670:
--------------------------------------
In CALCITE-1587 we added a property, approximateDistinctCount. The idea was to
push distinct-count down to Druid's Cardinality if approximateDistinctCount is
true.
I would also like to be able to declare that a particular aggregate call is
approximate; in CALCITE-1588 [~gian] remarked that Druid SQL has an operator
called {{APPROX_COUNT_DISTINCT}}.
I wasn't aware that there was a way to accomplish distinct-count in Druid. We
have a rewrite rule in Calcite that can do it. It generates two levels of
Aggregate. I believe (please correct me if I'm wrong) that Druid can only do
one Aggregate pass. If so, maybe we could enable that rule and we could push
one of the levels of Aggregate down to Druid.
> Count distinct on druid is translated to Cardinality aggregator which is
> approximate
> ------------------------------------------------------------------------------------
>
> Key: CALCITE-1670
> URL: https://issues.apache.org/jira/browse/CALCITE-1670
> Project: Calcite
> Issue Type: Bug
> Reporter: Nishant Bangarwa
> Assignee: Julian Hyde
>
> Right now count distinct on Druid is translated as a 'cardinality' aggregator
> which uses hyperloglog and return approximate results. See cardinality
> aggregator here - http://druid.io/docs/latest/querying/aggregations.html for
> details.
> https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
> {code}
> case COUNT:
> if (aggCall.isDistinct()) {
> return new JsonCardinalityAggregation("cardinality", name, list);
> }
> return new JsonAggregation("count", name, only);
> {code}
> The current recommended way in druid to get exact counts is to do a nested
> groupby query.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)