Karthikeyan Manivannan created DRILL-6896:
---------------------------------------------
Summary: Extraneous columns being projected in Drill 1.15
Key: DRILL-6896
URL: https://issues.apache.org/jira/browse/DRILL-6896
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.15.0
Reporter: Karthikeyan Manivannan
Assignee: Aman Sinha
[~rhou] noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14.
Analysis revealed that an extra column was being projected in 1.15 and the
slowdown was because the extra column was being unnecessarily pushed across an
exchange.
Here is a simplified query written by [~amansinha100] that exhibits the same
problem :
In first plan, o_custkey and o_comment are both extraneous projections.
In the second plan (on 1.14.0), also, there is an extraneous projection:
o_custkey but not o_comment.
On 1.15.0:
-------------
explain plan without implementation for
select
c.c_custkey
from
cp.`tpch/customer.parquet` c
left outer join cp.`tpch/orders.parquet` o
on c.c_custkey = o.o_custkey
and o.o_comment not like '%special%requests%'
;
DrillScreenRel
DrillProjectRel(c_custkey=[$0])
DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])
DrillJoinRel(condition=[=($2, $0)], joinType=[right])
DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]],
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`c_custkey`]]])
On 1.14.0:
-----------------
DrillScreenRel
DrillProjectRel(c_custkey=[$0])
DrillProjectRel(c_custkey=[$1], o_custkey=[$0])
DrillJoinRel(condition=[=($1, $0)], joinType=[right])
DrillProjectRel(o_custkey=[$0])
DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]],
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`c_custkey`]]])
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)