[ 
https://issues.apache.org/jira/browse/DRILL-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531992#comment-16531992
 ] 

ASF GitHub Bot commented on DRILL-6553:
---------------------------------------

HanumathRao commented on issue #1353: DRILL-6553: Fix TopN for unnest operator
URL: https://github.com/apache/drill/pull/1353#issuecomment-402303904
 
 
   @vvysotskyi Thanks for making the changes. Changes look good to me. +1.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Fix TopN for unnest operator
> ----------------------------
>
>                 Key: DRILL-6553
>                 URL: https://issues.apache.org/jira/browse/DRILL-6553
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Plan for the query with unnest is chosen non-optimally:
> {code:sql}
> select customer.c_custkey, customer.c_name, t.o.o_orderkey,t.o.o_totalprice
> from dfs.`lateraljoin/multipleFiles` customer,
> unnest(customer.c_orders) t(o)
> order by customer.c_custkey, t.o.o_orderkey, t.o.o_totalprice
> limit 50
> {code}
> Plan:
> {noformat}
> 00-00    Screen
> 00-01      ProjectAllowDup(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], 
> EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              Sort(sort0=[$0], sort1=[$2], sort2=[$3], dir0=[ASC], 
> dir1=[ASC], dir2=[ASC])
> 00-06                Project(c_custkey=[$2], c_name=[$3], EXPR$2=[ITEM($4, 
> 'o_orderkey')], EXPR$3=[ITEM($4, 'o_totalprice')])
> 00-07                  LateralJoin(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{1}])
> 00-09                    Project(T0¦¦**=[$0], c_orders=[$1], c_custkey=[$2], 
> c_name=[$3])
> 00-11                      Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles,
>  numFiles=2, columns=[`**`], 
> files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json,
>  
> file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> 00-08                    Project(c_orders0=[$0])
> 00-10                      Unnest [srcOp=00-07] 
> {noformat}
> A similar query, but with flatten:
> {code:sql}
> select f.c_custkey, f.c_name, f.o.o_orderkey, f.o.o_totalprice from (select 
> c_custkey, c_name, flatten(c_orders) as o from 
> dfs.`lateraljoin/multipleFiles` customer) f order by f.c_custkey, 
> f.o.o_orderkey, f.o.o_totalprice limit 50
> {code}
> has plan:
> {noformat}
> 00-00    Screen
> 00-01      Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              TopN(limit=[50])
> 00-06                Project(c_custkey=[$0], c_name=[$1], EXPR$2=[ITEM($2, 
> 'o_orderkey')], EXPR$3=[ITEM($2, 'o_totalprice')])
> 00-07                  Flatten(flattenField=[$2])
> 00-08                    Project(c_custkey=[$0], c_name=[$1], o=[$2])
> 00-09                      Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles,
>  numFiles=2, columns=[`c_custkey`, `c_name`, `c_orders`], 
> files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json,
>  
> file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> {noformat}
> The main difference is that for the case of unnest, a project wasn't pushed 
> to the scan and Limit with Sort weren't converted to TopN. 
> The first problem is tracked by DRILL-6545 and this Jira aims to fix the 
> problem with TopN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to