[ 
https://issues.apache.org/jira/browse/DRILL-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718317#comment-16718317
 ] 

Aman Sinha commented on DRILL-6896:
-----------------------------------

On further analysis,  I can see a similar (not exactly same) extraneous column 
projection even on 1.14.0 and with an Inner Join query, with the filter 
condition in the WHERE clause (instead of ON clause).  

On 1.14.0: 
{noformat}
 explain plan without implementation for 
 select
   c.c_custkey
    from
       cp.`tpch/customer.parquet` c 
       inner join cp.`tpch/orders.parquet` o 
        on c.c_custkey = o.o_custkey
         where o.o_orderkey < 10 and c.c_name like 'abc%'
 ;
+------+------+
| text | json |
+------+------+
DrillScreenRel
  DrillProjectRel(c_custkey=[$0])
    DrillProjectRel(c_custkey=[$2], c_name=[$3], o_custkey=[$0], 
o_orderkey=[$1])
      DrillJoinRel(condition=[=($2, $0)], joinType=[inner])
        DrillFilterRel(condition=[<($1, 10)])
          DrillScanRel(table=[[cp, tpch/orders.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/orders.parquet]], 
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`o_custkey`, `o_orderkey`]]])
        DrillFilterRel(condition=[LIKE($1, 'abc%')])
          DrillScanRel(table=[[cp, tpch/customer.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`c_custkey`, `c_name`]]])
{noformat}

Notice above that the DrillProjectRel above the Join is projecting both c_name 
and o_orderkey even though both these columns are only needed by the filter 
conditions below the DrillJoinRel.  There are 2 possible reasons why they were 
not pushed down: 
1.  Costing.  For whatever reason, the projection pushdown did not change the 
cost, so Calcite preserved the original non-pushdown plan.  This is less likely 
compared to #2 below. 
2.  Potential issues with rule application.  There were some issues with 
Calcite's ProjectMergeRule (see 
https://issues.apache.org/jira/browse/CALCITE-2223) which required a workaround 
implemented in [DRILL-6212|https://issues.apache.org/jira/browse/DRILL-6212].  
A second JIRA: [DRILL-6501|https://issues.apache.org/jira/browse/DRILL-6501] 
was filed to revert to using ProjectMergeRule later. 

Considering that 1.14.0 also shows a similar issue (albeit not for the same 
query),  I am inclined to move this JIRA to 1.16.0 release.  

> Extraneous columns being projected past a join
> ----------------------------------------------
>
>                 Key: DRILL-6896
>                 URL: https://issues.apache.org/jira/browse/DRILL-6896
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.15.0
>            Reporter: Karthikeyan Manivannan
>            Assignee: Aman Sinha
>            Priority: Major
>
> [~rhou] noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14. 
> Analysis revealed that an extra column was being projected in 1.15 and the 
> slowdown was because the extra column was being unnecessarily pushed across 
> an exchange.
> Here is a simplified query written by [~amansinha100] that exhibits the same 
> problem :
> In first plan, o_custkey and o_comment are both extraneous projections. 
>  In the second plan (on 1.14.0), also, there is an extraneous projection: 
> o_custkey but not o_comment.
> On 1.15.0:
> -------------
> {noformat}
> explain plan without implementation for 
>     select
>       c.c_custkey
>     from
>        cp.`tpch/customer.parquet` c 
>          left outer join cp.`tpch/orders.parquet` o 
>       on c.c_custkey = o.o_custkey
>      and o.o_comment not like '%special%requests%'
>    ;
> DrillScreenRel
>   DrillProjectRel(c_custkey=[$0])
>     DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])
>       DrillJoinRel(condition=[=($2, $0)], joinType=[right])
>         DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
>           DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/customer.parquet]], 
> selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`c_custkey`]]])
> {noformat}
> On 1.14.0:
> -------------
> {noformat}
> DrillScreenRel
>   DrillProjectRel(c_custkey=[$0])
>     DrillProjectRel(c_custkey=[$1], o_custkey=[$0])
>       DrillJoinRel(condition=[=($1, $0)], joinType=[right])
>         DrillProjectRel(o_custkey=[$0])
>           DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
>             DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/customer.parquet]], 
> selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`c_custkey`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to