Zain Humayun created CALCITE-1828:
-------------------------------------
Summary: Push the FILTER clause into Druid as a Filtered
Aggregator
Key: CALCITE-1828
URL: https://issues.apache.org/jira/browse/CALCITE-1828
Project: Calcite
Issue Type: Improvement
Components: druid
Affects Versions: 1.12.0
Reporter: Zain Humayun
Assignee: Zain Humayun
Druid has support for a special aggregator it calls the [Filtered
Aggregator|http://druid.io/docs/latest/querying/aggregations.html] that allows
aggregations to occur with filters independent to other filters in the Druid
query.
An example where the filtered aggregator is useful:
{code:sql}
SELECT
sum("col1") FILTER (WHERE <condition1>),
sum("col2") FILTER (WHERE <condition2>)
FROM "table";
{code}
Currently, calcite will scan Druid, then do the filtering and aggregation
itself. With filtered aggregators, both the filter and aggregation and be
pushed into Druid.
*A few comments/questions:*
1) If all conditions in the filter clause are the same, then instead of pushing
filtered aggregators individually, it would make more sense to push 1 single
filter into the Druid query. I.e the filters can be factored out into 1 filter.
I don't see calcite currently do this, does it have such a rule in place
already?
2) The filters can/should only be pushed if they are filtering on dimension
columns
3) Currently, the above query would create the following relation:
DruidQuery -> Project -> Aggregate. There is already a rule called
{{DruidAggregateProjectRule}} which matches the previous relation. Is it better
to add logic to that rule, or to create a new rule that also matches that
relation?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)