[ 
https://issues.apache.org/jira/browse/DRILL-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6553:
---------------------------------------
    Labels: ready-to-commit  (was: )

> Fix TopN for unnest operator
> ----------------------------
>
>                 Key: DRILL-6553
>                 URL: https://issues.apache.org/jira/browse/DRILL-6553
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>
> Plan for the query with unnest is chosen non-optimally:
> {code:sql}
> select customer.c_custkey, customer.c_name, t.o.o_orderkey,t.o.o_totalprice
> from dfs.`lateraljoin/multipleFiles` customer,
> unnest(customer.c_orders) t(o)
> order by customer.c_custkey, t.o.o_orderkey, t.o.o_totalprice
> limit 50
> {code}
> Plan:
> {noformat}
> 00-00    Screen
> 00-01      ProjectAllowDup(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], 
> EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              Sort(sort0=[$0], sort1=[$2], sort2=[$3], dir0=[ASC], 
> dir1=[ASC], dir2=[ASC])
> 00-06                Project(c_custkey=[$2], c_name=[$3], EXPR$2=[ITEM($4, 
> 'o_orderkey')], EXPR$3=[ITEM($4, 'o_totalprice')])
> 00-07                  LateralJoin(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{1}])
> 00-09                    Project(T0¦¦**=[$0], c_orders=[$1], c_custkey=[$2], 
> c_name=[$3])
> 00-11                      Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles,
>  numFiles=2, columns=[`**`], 
> files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json,
>  
> file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> 00-08                    Project(c_orders0=[$0])
> 00-10                      Unnest [srcOp=00-07] 
> {noformat}
> A similar query, but with flatten:
> {code:sql}
> select f.c_custkey, f.c_name, f.o.o_orderkey, f.o.o_totalprice from (select 
> c_custkey, c_name, flatten(c_orders) as o from 
> dfs.`lateraljoin/multipleFiles` customer) f order by f.c_custkey, 
> f.o.o_orderkey, f.o.o_totalprice limit 50
> {code}
> has plan:
> {noformat}
> 00-00    Screen
> 00-01      Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              TopN(limit=[50])
> 00-06                Project(c_custkey=[$0], c_name=[$1], EXPR$2=[ITEM($2, 
> 'o_orderkey')], EXPR$3=[ITEM($2, 'o_totalprice')])
> 00-07                  Flatten(flattenField=[$2])
> 00-08                    Project(c_custkey=[$0], c_name=[$1], o=[$2])
> 00-09                      Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles,
>  numFiles=2, columns=[`c_custkey`, `c_name`, `c_orders`], 
> files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json,
>  
> file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> {noformat}
> The main difference is that for the case of unnest, a project wasn't pushed 
> to the scan and Limit with Sort weren't converted to TopN. 
> The first problem is tracked by DRILL-6545 and this Jira aims to fix the 
> problem with TopN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to