[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

Zain Humayun (JIRA) Tue, 23 May 2017 09:34:17 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021425#comment-16021425
 ]


Zain Humayun edited comment on CALCITE-1787 at 5/23/17 4:33 PM:
----------------------------------------------------------------

My aim is to be able to write something like 

{{SELECT COUNT(DISTINCT "col") FROM "table";}} and let calcite generate one of:
* thetaSketch aggregate
* hyperUnique aggregate 
* cardinality aggregate (already in calcite, default aggregate for count 
distinct queries)

Calcite can determine which aggregate to generate by looking at the DruidType 
for "col" from the metadata query. The logic for that would just go in the 
{{getJsonAggregation}} method in DruidQuery.


was (Author: zhumayun):
My aim is to write 

{{SELECT COUNT(DISTINCT "col") FROM "table";}} and let calcite generate one of:
* thetaSketch aggregate
* hyperUnique aggregate 
* cardinality aggregate (already in calcite, default aggregate for count 
distinct queries)

Calcite can determine which aggregate to generate by looking at the DruidType 
for "col" from the metadata query. The logic for that would just go in the 
{{getJsonAggregation}} method in DruidQuery.

> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Julian Hyde
>            Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

Reply via email to