[ 
https://issues.apache.org/jira/browse/DRILL-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-6896:
------------------------------
    Summary: Extraneous columns being projected past a join  (was: Extraneous 
columns being projected in Drill 1.15)

> Extraneous columns being projected past a join
> ----------------------------------------------
>
>                 Key: DRILL-6896
>                 URL: https://issues.apache.org/jira/browse/DRILL-6896
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.15.0
>            Reporter: Karthikeyan Manivannan
>            Assignee: Aman Sinha
>            Priority: Major
>
> [~rhou] noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14. 
> Analysis revealed that an extra column was being projected in 1.15 and the 
> slowdown was because the extra column was being unnecessarily pushed across 
> an exchange.
> Here is a simplified query written by [~amansinha100] that exhibits the same 
> problem :
> In first plan, o_custkey and o_comment are both extraneous projections. 
>  In the second plan (on 1.14.0), also, there is an extraneous projection: 
> o_custkey but not o_comment.
> On 1.15.0:
> -------------
> explain plan without implementation for 
>  select
>  c.c_custkey
>  from
>  cp.`tpch/customer.parquet` c 
>  left outer join cp.`tpch/orders.parquet` o 
>  on c.c_custkey = o.o_custkey
>  and o.o_comment not like '%special%requests%'
>  ;
> DrillScreenRel
> DrillProjectRel(c_custkey=[$0])
> DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])
> DrillJoinRel(condition=[=($2, $0)], joinType=[right])
> DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
> DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
> DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
> selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`c_custkey`]]])
> On 1.14.0:
> -----------------
> DrillScreenRel
> DrillProjectRel(c_custkey=[$0])
> DrillProjectRel(c_custkey=[$1], o_custkey=[$0])
> DrillJoinRel(condition=[=($1, $0)], joinType=[right])
> DrillProjectRel(o_custkey=[$0])
> DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
> DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
> DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
> selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`c_custkey`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to