[jira] [Created] (CALCITE-1706) DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

Julian Hyde (JIRA) Fri, 17 Mar 2017 11:35:54 -0700

Julian Hyde created CALCITE-1706:
------------------------------------

             Summary: DruidAggregateFilterTransposeRule causes very 
fine-grained aggregations to be pushed to Druid
                 Key: CALCITE-1706
                 URL: https://issues.apache.org/jira/browse/CALCITE-1706
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde



Enabling DruidAggregateFilterTransposeRule may cause very fine-grained 
aggregations to be pushed to Druid.

Running {{DruidAdapterIT.testFilterTimestamp}}, here is the previous plan (with 
{{DruidAggregateFilterTransposeRule}} disabled):

{noformat}
EnumerableInterpreter
  BindableAggregate(group=[{}], C=[COUNT()])
    BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 
1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), 
OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 
86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), 
<(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
      DruidQuery(table=[[foodmart, foodmart]], 
intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
{noformat}

Here is the (in my opinion inferior) plan with 
{{DruidAggregateFilterTransposeRule}} enabled:

{noformat}
EnumerableInterpreter
  BindableAggregate(group=[{}], C=[$SUM0($1)])
    BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), 
/INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), 
/INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), 
/INT(Reinterpret($0), 86400000)), 6)))])
      DruidQuery(table=[[foodmart, foodmart]], 
intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], 
aggs=[[COUNT()]])
{noformat}

Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp 
is very high cardinality, is this an efficient operation for Druid?

For this particular query, the ideal would be to push the filter into the 
{{intervals}} clause. Then we would not need to group by __timestamp. I am not 
sure why this is not happening.

[~nishantbangarwa], [~bslim], How bad is the query with 
{{DruidAggregateFilterTransposeRule}} enabled, in your opinion? Is this a 
show-stopper for Calcite 1.12?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CALCITE-1706) DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

Reply via email to