[ 
https://issues.apache.org/jira/browse/DRILL-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-885:
---------------------------------

    Component/s: Query Planning & Optimization
       Assignee: Jinfeng Ni

> Handle project pushdown for constant expressions
> ------------------------------------------------
>
>                 Key: DRILL-885
>                 URL: https://issues.apache.org/jira/browse/DRILL-885
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Aman Sinha
>            Assignee: Jinfeng Ni
>
> In the following query, notice in the Explain plan that the node  
> Project($f0=[0])  is projecting a constant, so ideally we should not have to 
> produce a whole bunch of columns from either side of the join unless those 
> columns are needed for the join condition.  However, currently we do produce 
> those unnecessary columns from the Scan below (see the Customer parquet scan 
> on the left side of the HashJoin).   This hurts performance.
> 0: jdbc:drill:zk=local> explain plan for select count(*) from (select 
> c.c_custkey, c.c_name, c.c_address, c.c_nationkey,  c.c_phone, c.c_acctbal, 
> c.c_mktsegment, c.c_comment, n.n_nationkey, n.n_name, n.n_nationkey, 
> n.n_comment from cp.`tpch/customer.parquet` c JOIN cp.`tpch/nation.parquet` n 
> ON (c.c_nationkey = n.n_nationkey));
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[SUM($0)])
> 00-02        UnionExchange
> 01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02            Project($f0=[0])
> 01-03              HashJoin(condition=[=($1, $10)], joinType=[inner])
> 01-05                HashToRandomExchange(dist0=[[$1]])
> 02-01                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]], 
> selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_nationkey`], 
> SchemaPath [`c_custkey`], SchemaPath [`c_name`], SchemaPath [`c_address`], 
> SchemaPath [`c_phone`], SchemaPath [`c_acctbal`], SchemaPath 
> [`c_mktsegment`], SchemaPath [`c_comment`]]]])
> 01-04                Project(*0=[$0], n_nationkey=[$1], n_name=[$2], 
> n_comment=[$3])
> 01-06                  HashToRandomExchange(dist0=[[$1]])
> 03-01                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
> selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`], 
> SchemaPath [`n_name`], SchemaPath [`n_comment`]]]])
> Here's the Drill Logical plan for the same query:
> | DrillScreenRel
>   DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
>     DrillProjectRel($f0=[0])
>       DrillJoinRel(condition=[=($1, $10)], joinType=[inner])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]])
>         DrillScanRel(table=[[cp, tpch/nation.parquet]])



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to