[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

Zain Humayun (JIRA) Thu, 15 Jun 2017 14:43:40 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051104#comment-16051104
 ]


Zain Humayun commented on CALCITE-1787:
---------------------------------------

Recap and some implementation questions:

Columns of type thetaSketch/hyperUnique should be moved from the "metrics" 
field to a new "complexMetrics" field from the model definition. 
Each complex metric will have the form:
{code:none}
{
  "name" : <name used in SQL statements>,
  "type" : <type>,
  "meticName" : <name of underlying metric in Druid>
}
{code}
This data will be saved into DruidTable. Note: while this information will be 
provided by model definitions, calcite will have to rename any sketch columns 
in the meta data query (when the model definition isn't available). 

Calcite should reject any SQL statements that use the complex metrics in 
correctly. Ideally, complex metrics should be able to indicate to validation 
code what kind of statements they can be used in. Any ideas on the best way to 
do so? Where is the best place to interrupt the validation process and check 
for this kind of condition? At that point, we'll also need access to the 
DruidTable because it will hold the information about the columns.

Once validation has finished, DruidQuery will be responsible for figuring out 
that the actual column (sketch column) is based on the name and context in 
which it's used.

I believe the most complicated part of this will be validation. Do you have any 
general suggestions on where to start? I'm not very familiar with the 
calcite-core code. Thanks.  

> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Zain Humayun
>            Priority: Minor
>
> Currently, the Druid adapter does not support the 
> [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html]
>  aggregate type, which is used to measure the cardinality of a column 
> quickly. Many Druid instances support theta sketches, so I think it would be 
> a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType 
> called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} 
> method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. 
> This will require accessing information about the columns (what data type 
> they are) so that the thetaSketch aggregate is only produced if the column's 
> type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but 
> a {{hyperUnique}} aggregate is never produced. Since both are approximate 
> aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have 
> for this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CALCITE-1787) thetaSketch Support for Druid Adapter

Reply via email to