[
https://issues.apache.org/jira/browse/DRILL-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875897#comment-14875897
]
Victoria Markman commented on DRILL-3716:
-----------------------------------------
I've tested this with 1.2.0 and this particular case works now.
#Wed Sep 16 00:30:47 UTC 2015
git.commit.id.abbrev=9afcf61
Working on a regression test suite for filter pushdown.
{code}
0: jdbc:drill:schema=dfs> explain plan for select n_regionkey, cnt from
. . . . . . . . . . . . > (select n_regionkey, count(*) cnt
. . . . . . . . . . . . > from (select n.n_nationkey, n.n_regionkey,
n.n_name
. . . . . . . . . . . . > from cp.`tpch/nation.parquet` n
. . . . . . . . . . . . > left join
. . . . . . . . . . . . > cp.`tpch/region.parquet` r
. . . . . . . . . . . . > on n.n_regionkey = r.r_regionkey)
. . . . . . . . . . . . > group by n_regionkey)
. . . . . . . . . . . . > where n_regionkey = 2;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(n_regionkey=[$0], cnt=[$1])
00-02 Project(n_regionkey=[$0], cnt=[$1])
00-03 StreamAgg(group=[{0}], cnt=[COUNT()])
00-04 Sort(sort0=[$0], dir0=[ASC])
00-05 Project(n_regionkey=[$0])
00-06 Project(n_regionkey=[$1], r_regionkey=[$0])
00-07 HashJoin(condition=[=($1, $0)], joinType=[right])
00-09 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]],
selectionRoot=classpath:/tpch/region.parquet, numFiles=1,
columns=[`r_regionkey`]]])
00-08 SelectionVectorRemover
00-10 Filter(condition=[=($0, 2)])
00-11 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
columns=[`n_regionkey`]]])
{code}
> Drill should push filter past aggregate in order to improve query performance.
> ------------------------------------------------------------------------------
>
> Key: DRILL-3716
> URL: https://issues.apache.org/jira/browse/DRILL-3716
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.2.0
>
>
> For the following query which has a filter on top of an aggregation, Drill's
> currently push the filter pass through the aggregation. As a result, we may
> miss some optimization opportunity. For instance, such filter could
> potentially been pushed into scan if it qualifies for partition pruning.
> For the following query:
> {code}
> select n_regionkey, cnt from
> (select n_regionkey, count(*) cnt
> from (select n.n_nationkey, n.n_regionkey, n.n_name
> from cp.`tpch/nation.parquet` n
> left join
> cp.`tpch/region.parquet` r
> on n.n_regionkey = r.r_regionkey)
> group by n_regionkey)
> where n_regionkey = 2;
> {code}
> The current plan shows a filter (00-04) on top of aggregation(00-05). The
> better plan would have the filter pushed pass the aggregation.
> The root cause of this problem is Drill's ruleset does not include
> FilterAggregateTransoposeRule from Calcite library.
> {code}
> 00-01 Project(n_regionkey=[$0], cnt=[$1])
> 00-02 Project(n_regionkey=[$0], cnt=[$1])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[=($0, 2)])
> 00-05 StreamAgg(group=[{0}], cnt=[COUNT()])
> 00-06 Project(n_regionkey=[$0])
> 00-07 MergeJoin(condition=[=($0, $1)], joinType=[left])
> 00-09 SelectionVectorRemover
> 00-11 Sort(sort0=[$0], dir0=[ASC])
> 00-13 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
> columns=[`n_regionkey`]]])
> 00-08 SelectionVectorRemover
> 00-10 Sort(sort0=[$0], dir0=[ASC])
> 00-12 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]],
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1,
> columns=[`r_regionkey`]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)