[ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044422#comment-16044422
 ] 

slim bouguerra commented on CALCITE-1787:
-----------------------------------------

+1 for the idea of abstract metric or what we call in druid complex metric.
Are we saying that for this to work the druid user has to follow this naming 
convention for columns?
Does this still work if we have multiple sketches for user ? (it is pretty 
common use case where the user is tracked via multiple streams hence multiple 
sketches)
How calcite will be able to know the details about whether this sketch can be 
used as a histogram or count ? 
Keep in mind that hyperUnique like  Theta-sketches or Quantile-Histogram are 
UDFs so we can have different UDFs that does the same thing in the same table 
where each UDF has its own API and capabilities.
As an example Theta-Sketches (Yahoo sketches) and druid HLL can be used to 
compute unique user estimate but T-Sketch can do intersection/subtract/union 
while HLL can only do union.

> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Zain Humayun
>            Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to