Paul Rogers created DRILL-5199:
----------------------------------

             Summary: Planner inserts three projects when one will do
                 Key: DRILL-5199
                 URL: https://issues.apache.org/jira/browse/DRILL-5199
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor


See the query and description for DRILL-5198. The plan in that query has a 
number of opportunities for improvement. This bug touches on a minor issue: the 
plan has a series of three project operators in series when a single project 
would probably work just as well (and would be somewhat more efficient.)

Here is the subset of the plan in question:

{code}
02-01                        UnorderedMuxExchange : rowType = RecordType(ANY 
T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, 
cumulative cost = {7.17696212E8 rows, 1.973664583E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 449
03-01                          Project(T0¦¦*=[$0], EXPR$1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY 
T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, 
cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 448
03-02                            Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : 
rowType = RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, 
cumulative cost = {3.58848106E8 rows, 1.076544318E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 447
03-03                              Project(T0¦¦*=[$0], columns=[$1]) : rowType 
= RecordType(ANY T0¦¦*, ANY columns): rowcount = 1.79424053E8, cumulative cost 
= {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
446
03-04                                Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
 numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
 : rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, cumulative 
cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 445
{code}

This issue is minor because project is a relatively inexpensive operation 
(insert or remove a vector, done batch-by-batch, rather than a row-by-row 
operation.) Still, every little bit of optimization helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to