[
https://issues.apache.org/jira/browse/DRILL-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804523#comment-14804523
]
Jinfeng Ni commented on DRILL-2748:
-----------------------------------
DrillAggregateRel, which is the logical aggregate operator, currently call
makeTinyCost(), regardless how many rows the input has, how many aggregation
functions it contains. That lead to the costing estimation could not reflect
the saving, if the filter is push past the aggregate operator.
The fix is to improve the cost estimation for logical aggregate. Similar to
join operator, we use the same formula for HashAgg and logical aggregate, under
the assumption that HashAgg is more likely to happen in the physical planning.
> Filter is not pushed down into subquery with the group by
> ---------------------------------------------------------
>
> Key: DRILL-2748
> URL: https://issues.apache.org/jira/browse/DRILL-2748
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization
> Affects Versions: 0.9.0, 1.0.0, 1.1.0
> Reporter: Victoria Markman
> Assignee: Jinfeng Ni
> Fix For: 1.2.0
>
> Attachments:
> 0001-DRILL-2748-Add-optimizer-rule-to-push-filter-past-ag.patch
>
>
> I'm not sure about this one, theoretically filter could have been pushed into
> the subquery.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from (select a1,
> b1, avg(a1) from t1 group by a1, b1) as sq(x, y, z) where x = 10;
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(x=[$0], y=[$1], z=[$2])
> 00-02 Project(x=[$0], y=[$1], z=[CAST(/(CastHigh(CASE(=($3, 0), null,
> $2)), $3)):ANY NOT NULL])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[=($0, 10)])
> 00-05 HashAgg(group=[{0, 1}], agg#0=[$SUM0($0)],
> agg#1=[COUNT($0)])
> 00-06 Project(a1=[$1], b1=[$0])
> 00-07 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]],
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`,
> `b1`]]])
> {code}
> Same with distinct in subquery:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select
> distinct a1, b1, c1 from t1 ) as sq(x, y, z) where x = 10;
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(x=[$0], y=[$1], z=[$2])
> 00-02 Project(x=[$0], y=[$1], z=[$2])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[=($0, 10)])
> 00-05 HashAgg(group=[{0, 1, 2}])
> 00-06 Project(a1=[$2], b1=[$1], c1=[$0])
> 00-07 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]],
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`,
> `c1`]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)