[
https://issues.apache.org/jira/browse/DRILL-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829115#comment-15829115
]
Jinfeng Ni commented on DRILL-5200:
-----------------------------------
The reason filter is not being pushed down is that it refers to column expanded
from * column, which happens dynamically in execution time. This is a known
restriction in the optimizer rule Drill uses (extended from Calcite).
> Nested query fails to push filter down near scan
> ------------------------------------------------
>
> Key: DRILL-5200
> URL: https://issues.apache.org/jira/browse/DRILL-5200
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.9.0
> Reporter: Paul Rogers
> Priority: Minor
>
> Consider the query described in DRILL-5198. The query was deliberately
> designed to do a full sort and discard results. Unfortunately, the query
> succeeded when it should not have been able to do so. The query:
> {code}
> select * from (select * from dfs.`/big-csv-file.csv` order by columns[0])d
> where d.columns[0] = 'bogus value';
> {code}
> The resulting plan. Note that the filter (which removes all rows) is above
> the sort; should be below.
> {code}
> 00-00 Screen : rowType = RecordType(ANY *): rowcount = 2.691360795E7,
> cumulative cost = {1.6444214457450001E9 rows, 2.6992589029593388E10 cpu, 0.0
> io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 459
> 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount =
> 2.691360795E7, cumulative cost = {1.64173008495E9 rows, 2.698989766879839E10
> cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 458
> 00-02 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*):
> rowcount = 2.691360795E7, cumulative cost = {1.64173008495E9 rows,
> 2.698989766879839E10 cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9
> memory}, id = 457
> 00-03 Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0),
> 'ljdfhwuehnoiueyf')]) : rowType = RecordType(ANY T0¦¦*): rowcount =
> 2.691360795E7, cumulative cost = {1.614816477E9 rows, 2.696298406084839E10
> cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 456
> 00-04 Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*):
> rowcount = 1.79424053E8, cumulative cost = {1.435392424E9 rows,
> 2.613763341704839E10 cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9
> memory}, id = 455
> 00-05 SingleMergeExchange(sort0=[1 ASC]) : rowType =
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost =
> {1.435392424E9 rows, 2.613763341704839E10 cpu, 0.0 io, 3.67460460544E12
> network, 2.870784848E9 memory}, id = 454
> 01-01 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*,
> ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {1.255968371E9 rows,
> 2.470224099304839E10 cpu, 0.0 io, 2.204762763264E12 network, 2.870784848E9
> memory}, id = 453
> 01-02 Sort(sort0=[$1], dir0=[ASC]) : rowType =
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost =
> {1.076544318E9 rows, 2.452281694004839E10 cpu, 0.0 io, 2.204762763264E12
> network, 2.870784848E9 memory}, id = 452
> 01-03 Project(T0¦¦*=[$0], EXPR$1=[$1]) : rowType =
> RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost =
> {8.97120265E8 rows, 4.844449431E9 cpu, 0.0 io, 2.204762763264E12 network, 0.0
> memory}, id = 451
> 01-04 HashToRandomExchange(dist0=[[$1]]) : rowType =
> RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
> 1.79424053E8, cumulative cost = {8.97120265E8 rows, 4.844449431E9 cpu, 0.0
> io, 2.204762763264E12 network, 0.0 memory}, id = 450
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY
> T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8,
> cumulative cost = {7.17696212E8 rows, 1.973664583E9 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 449
> 03-01 Project(T0¦¦*=[$0], EXPR$1=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY
> T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8,
> cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 448
> 03-02 Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) :
> rowType = RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8,
> cumulative cost = {3.58848106E8 rows, 1.076544318E9 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 447
> 03-03 Project(T0¦¦*=[$0], columns=[$1]) :
> rowType = RecordType(ANY T0¦¦*, ANY columns): rowcount = 1.79424053E8,
> cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 446
> 03-04 Scan(groupscan=[EasyGroupScan
> [selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
> numFiles=1, columns=[`*`],
> files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
> : rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8,
> cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 445
> {code}
> What should have happened is that the filter was pushed down near the scan.
> It is likely that the clever nested query structure used used in the query
> tricks the planner into missing an optimization opportunity.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)