[jira] [Commented] (CALCITE-1587) Druid adapter: topN returns approximate results

Gian Merlino (JIRA) Wed, 18 Jan 2017 10:53:36 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828547#comment-15828547
 ]


Gian Merlino commented on CALCITE-1587:
---------------------------------------

In Druid's built-in SQL we make this an option, 
druid.sql.planner.useApproximateTopN. fwiw we also have a similar option for 
whether COUNT(DISTINCT col)) should be approximate or not.

Also, topNs are exact if you are sorting on the dimension, and will be faster 
than groupBy in that case since groupBy doesn't yet push down limits all the 
way to the data nodes (although we are working on this). So it's still useful, 
and exact, to use them for queries like "SELECT DISTINCT foo FROM bar ORDER BY 
foo LIMIT 50". In Druid we do this even if druid.sql.planner.useApproximateTopN 
is false.

The topN approximation is described in detail at 
http://druid.io/docs/latest/querying/topnquery.html#aliasing

> Druid adapter: topN returns approximate results
> -----------------------------------------------
>
>                 Key: CALCITE-1587
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1587
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>    Affects Versions: 1.11.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Julian Hyde
>             Fix For: 1.12.0
>
>
> Currently, we convert to _topN_ queries. However, metrics returned by Druid 
> will be approximate values. Thus, probably we should not convert to Druid 
> topN queries and rather always use Druid groupBy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-1587) Druid adapter: topN returns approximate results

Reply via email to