Paul Rogers created DRILL-5199:
----------------------------------
Summary: Planner inserts three projects when one will do
Key: DRILL-5199
URL: https://issues.apache.org/jira/browse/DRILL-5199
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Paul Rogers
Priority: Minor
See the query and description for DRILL-5198. The plan in that query has a
number of opportunities for improvement. This bug touches on a minor issue: the
plan has a series of three project operators in series when a single project
would probably work just as well (and would be somewhat more efficient.)
Here is the subset of the plan in question:
{code}
02-01 UnorderedMuxExchange : rowType = RecordType(ANY
T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8,
cumulative cost = {7.17696212E8 rows, 1.973664583E9 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 449
03-01 Project(T0¦¦*=[$0], EXPR$1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY
T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8,
cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 448
03-02 Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) :
rowType = RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8,
cumulative cost = {3.58848106E8 rows, 1.076544318E9 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 447
03-03 Project(T0¦¦*=[$0], columns=[$1]) : rowType
= RecordType(ANY T0¦¦*, ANY columns): rowcount = 1.79424053E8, cumulative cost
= {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
446
03-04 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
numFiles=1, columns=[`*`],
files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
: rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, cumulative
cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 445
{code}
This issue is minor because project is a relatively inexpensive operation
(insert or remove a vector, done batch-by-batch, rather than a row-by-row
operation.) Still, every little bit of optimization helps.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)