[jira] [Commented] (DRILL-5200) Nested query fails to push filter down near scan

Jinfeng Ni (JIRA) Wed, 18 Jan 2017 17:23:13 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829115#comment-15829115
 ]


Jinfeng Ni commented on DRILL-5200:
-----------------------------------

The reason filter is not being pushed down is that it refers to column expanded 
from * column, which happens dynamically in execution time. This is a known 
restriction in the optimizer rule Drill uses (extended from Calcite). 
 

> Nested query fails to push filter down near scan
> ------------------------------------------------
>
>                 Key: DRILL-5200
>                 URL: https://issues.apache.org/jira/browse/DRILL-5200
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider the query described in DRILL-5198. The query was deliberately 
> designed to do a full sort and discard results. Unfortunately, the query 
> succeeded when it should not have been able to do so. The query:
> {code}
> select * from (select * from dfs.`/big-csv-file.csv` order by columns[0])d 
> where d.columns[0] = 'bogus value';
> {code}
> The resulting plan. Note that the filter (which removes all rows) is above 
> the sort; should be below.
> {code}
> 00-00    Screen : rowType = RecordType(ANY *): rowcount = 2.691360795E7, 
> cumulative cost = {1.6444214457450001E9 rows, 2.6992589029593388E10 cpu, 0.0 
> io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 459
> 00-01      Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 
> 2.691360795E7, cumulative cost = {1.64173008495E9 rows, 2.698989766879839E10 
> cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 458
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*): 
> rowcount = 2.691360795E7, cumulative cost = {1.64173008495E9 rows, 
> 2.698989766879839E10 cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 
> memory}, id = 457
> 00-03          Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0), 
> 'ljdfhwuehnoiueyf')]) : rowType = RecordType(ANY T0¦¦*): rowcount = 
> 2.691360795E7, cumulative cost = {1.614816477E9 rows, 2.696298406084839E10 
> cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 456
> 00-04            Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): 
> rowcount = 1.79424053E8, cumulative cost = {1.435392424E9 rows, 
> 2.613763341704839E10 cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 
> memory}, id = 455
> 00-05              SingleMergeExchange(sort0=[1 ASC]) : rowType = 
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = 
> {1.435392424E9 rows, 2.613763341704839E10 cpu, 0.0 io, 3.67460460544E12 
> network, 2.870784848E9 memory}, id = 454
> 01-01                SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, 
> ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {1.255968371E9 rows, 
> 2.470224099304839E10 cpu, 0.0 io, 2.204762763264E12 network, 2.870784848E9 
> memory}, id = 453
> 01-02                  Sort(sort0=[$1], dir0=[ASC]) : rowType = 
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = 
> {1.076544318E9 rows, 2.452281694004839E10 cpu, 0.0 io, 2.204762763264E12 
> network, 2.870784848E9 memory}, id = 452
> 01-03                    Project(T0¦¦*=[$0], EXPR$1=[$1]) : rowType = 
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = 
> {8.97120265E8 rows, 4.844449431E9 cpu, 0.0 io, 2.204762763264E12 network, 0.0 
> memory}, id = 451
> 01-04                      HashToRandomExchange(dist0=[[$1]]) : rowType = 
> RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 1.79424053E8, cumulative cost = {8.97120265E8 rows, 4.844449431E9 cpu, 0.0 
> io, 2.204762763264E12 network, 0.0 memory}, id = 450
> 02-01                        UnorderedMuxExchange : rowType = RecordType(ANY 
> T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, 
> cumulative cost = {7.17696212E8 rows, 1.973664583E9 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 449
> 03-01                          Project(T0¦¦*=[$0], EXPR$1=[$1], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY 
> T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, 
> cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 448
> 03-02                            Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : 
> rowType = RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, 
> cumulative cost = {3.58848106E8 rows, 1.076544318E9 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 447
> 03-03                              Project(T0¦¦*=[$0], columns=[$1]) : 
> rowType = RecordType(ANY T0¦¦*, ANY columns): rowcount = 1.79424053E8, 
> cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 446
> 03-04                                Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
>  numFiles=1, columns=[`*`], 
> files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
>  : rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, 
> cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 445
> {code}
> What should have happened is that the filter was pushed down near the scan. 
> It is likely that the clever nested query structure used used in the query 
> tricks the planner into missing an optimization opportunity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5200) Nested query fails to push filter down near scan

Reply via email to