[
https://issues.apache.org/jira/browse/DRILL-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacques Nadeau updated DRILL-885:
---------------------------------
Component/s: Query Planning & Optimization
Assignee: Jinfeng Ni
> Handle project pushdown for constant expressions
> ------------------------------------------------
>
> Key: DRILL-885
> URL: https://issues.apache.org/jira/browse/DRILL-885
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Aman Sinha
> Assignee: Jinfeng Ni
>
> In the following query, notice in the Explain plan that the node
> Project($f0=[0]) is projecting a constant, so ideally we should not have to
> produce a whole bunch of columns from either side of the join unless those
> columns are needed for the join condition. However, currently we do produce
> those unnecessary columns from the Scan below (see the Customer parquet scan
> on the left side of the HashJoin). This hurts performance.
> 0: jdbc:drill:zk=local> explain plan for select count(*) from (select
> c.c_custkey, c.c_name, c.c_address, c.c_nationkey, c.c_phone, c.c_acctbal,
> c.c_mktsegment, c.c_comment, n.n_nationkey, n.n_name, n.n_nationkey,
> n.n_comment from cp.`tpch/customer.parquet` c JOIN cp.`tpch/nation.parquet` n
> ON (c.c_nationkey = n.n_nationkey));
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 StreamAgg(group=[{}], EXPR$0=[SUM($0)])
> 00-02 UnionExchange
> 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02 Project($f0=[0])
> 01-03 HashJoin(condition=[=($1, $10)], joinType=[inner])
> 01-05 HashToRandomExchange(dist0=[[$1]])
> 02-01 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]],
> selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_nationkey`],
> SchemaPath [`c_custkey`], SchemaPath [`c_name`], SchemaPath [`c_address`],
> SchemaPath [`c_phone`], SchemaPath [`c_acctbal`], SchemaPath
> [`c_mktsegment`], SchemaPath [`c_comment`]]]])
> 01-04 Project(*0=[$0], n_nationkey=[$1], n_name=[$2],
> n_comment=[$3])
> 01-06 HashToRandomExchange(dist0=[[$1]])
> 03-01 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]],
> selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`],
> SchemaPath [`n_name`], SchemaPath [`n_comment`]]]])
> Here's the Drill Logical plan for the same query:
> | DrillScreenRel
> DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
> DrillProjectRel($f0=[0])
> DrillJoinRel(condition=[=($1, $10)], joinType=[inner])
> DrillScanRel(table=[[cp, tpch/customer.parquet]])
> DrillScanRel(table=[[cp, tpch/nation.parquet]])
--
This message was sent by Atlassian JIRA
(v6.2#6252)