[jira] [Commented] (CALCITE-1706) DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

Julian Hyde (JIRA) Fri, 17 Mar 2017 12:13:10 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930539#comment-15930539
 ]


Julian Hyde commented on CALCITE-1706:
--------------------------------------

[~bslim], Thanks. I disabled DruidAggregateFilterTransposeRule for now, and 
that returns us to the previous behavior, which was not too bad. With the rule 
enabled, the test was failing due to CALCITE-1436 (this wouldn't happen in 
Hive, but still, a failing test reduces coverage).

We should consider re-enabling the rule when we have your fix for CALCITE-1707.

> DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be 
> pushed to Druid
> ---------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1706
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1706
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> Enabling DruidAggregateFilterTransposeRule may cause very fine-grained 
> aggregations to be pushed to Druid.
> Running {{DruidAdapterIT.testFilterTimestamp}}, here is the previous plan 
> (with {{DruidAggregateFilterTransposeRule}} disabled):
> {noformat}
> EnumerableInterpreter
>   BindableAggregate(group=[{}], C=[COUNT()])
>     BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 
> 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), 
> OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), 
> <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 
> 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
>       DruidQuery(table=[[foodmart, foodmart]], 
> intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], 
> projects=[[$0]])
> {noformat}
> Here is the (in my opinion inferior) plan with 
> {{DruidAggregateFilterTransposeRule}} enabled:
> {noformat}
> EnumerableInterpreter
>   BindableAggregate(group=[{}], C=[$SUM0($1)])
>     BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), 
> /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), 
> /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), 
> /INT(Reinterpret($0), 86400000)), 6)))])
>       DruidQuery(table=[[foodmart, foodmart]], 
> intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], 
> aggs=[[COUNT()]])
> {noformat}
> Note that the DruidQuery is aggregating on __timestamp. Given that 
> __timestamp is very high cardinality, is this an efficient operation for 
> Druid?
> For this particular query, the ideal would be to push the filter into the 
> {{intervals}} clause. Then we would not need to group by __timestamp. I am 
> not sure why this is not happening.
> [~nishantbangarwa], [~bslim], How bad is the query with 
> {{DruidAggregateFilterTransposeRule}} enabled, in your opinion? Is this a 
> show-stopper for Calcite 1.12?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1706) DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

Reply via email to