[jira] [Commented] (CALCITE-1803) Add post aggregation support in Druid to optimize druid queries.

Julian Hyde (JIRA) Mon, 22 May 2017 17:57:54 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020492#comment-16020492
 ]


Julian Hyde commented on CALCITE-1803:
--------------------------------------

Do you mind if we change the terminology? "Post aggregation" suggests 
aggregation that happens after something. But I think you mean 
"Post-aggregation projects". Or in simpler English, "Projects after 
aggregation".

To answer your question: You will need to have a DruidQuery that contains a 
Scan followed by an Aggregate followed by a Project.

Currently DruidProjectRule will not allow the Project to be pushed in, because 
"sap" (scan, aggregate, project) is not a valid signature according to 
DruidQuery.VALID_SIG. But you should make it valid.

I'm curious:
* Does Druid allow filters after aggregation? (I.e. HAVING)
* I know that Druid allows sort after aggregation. But is this before or after 
the post-aggregation projects?

> Add post aggregation support in Druid to optimize druid queries.
> ----------------------------------------------------------------
>
>                 Key: CALCITE-1803
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1803
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.11.0
>            Reporter: Junxian Wu
>            Assignee: Julian Hyde
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Druid post aggregations are not supported when parsing SQL queries. By 
> implementing post aggregations, we can offload some computation to the druid 
> cluster rather than aggregate on the client side.
> Example usage:
> {{SELECT SUM("column1") - SUM("column2") FROM "table";}}
> This query will be parsed into two separate Druid aggregations according to 
> current rules. Then the results will be subtracted in Calcite. By using the 
> {{postAggregations}} field in the druid query, the subtraction could be done 
> in Druid cluster. Although the previous example is simple, the difference 
> will be obvious when the number of result rows are large. (Multiple rows 
> result will happen when group by is used).
> Questions:
> After I push Post aggregation into Druid query, what should I change on the 
> project relational correlation? In the case of the example above, the 
> {{BindableProject}} will have the expression to representation the 
> subtraction. If I push the post aggregation into druid query, the expression 
> of subtraction should be replaced by the representation of the post 
> aggregations result. For now, the project expression seems can only point to 
> the aggregations results. Since post aggregations have to point to 
> aggregations results too, it could not be placed in the parallel level as 
> aggregation. Where should I put post aggregations?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1803) Add post aggregation support in Druid to optimize druid queries.

Reply via email to