Nathaniel Auvil created DRILL-3897:
--------------------------------------

             Summary: Partitions not being pruned
                 Key: DRILL-3897
                 URL: https://issues.apache.org/jira/browse/DRILL-3897
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Nathaniel Auvil


have a two deep partitioning structure.  Drill is not pruning partitions 
correctly as it reads all files under every directory.  My source files are tab 
delimited files.  

My query:
select dir0 server, dir1 dayId,  max(LENGTH(columns[2])) maxSize from 
dfs.`/archive/psn` where dir1 >= 20151001 group by dir0,dir1 order by maxSize

plan snippet showing Drill reading uneeded files:


00-00    Screen : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): 
rowcount = 5.1772689218999994E8, cumulative cost = {4.898214127009591E11 rows, 
3.373451719812133E12 cpu, 0.0 io, 9.966863946928127E13 network, 
1.51590434033232E12 memory}, id = 44973
00-01      Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = 
RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 
5.1772689218999994E8, cumulative cost = {4.897696400117401E11 rows, 
3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 
1.51590434033232E12 memory}, id = 44972
00-02        SingleMergeExchange(sort0=[2 ASC]) : rowType = RecordType(ANY 
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative 
cost = {4.897696400117401E11 rows, 3.3733999471229136E12 cpu, 0.0 io, 
9.966863946928127E13 network, 1.51590434033232E12 memory}, id = 44971
01-01          SelectionVectorRemover : rowType = RecordType(ANY server, ANY 
dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = 
{4.892519131195501E11 rows, 3.3589035941415938E12 cpu, 0.0 io, 
9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44970
01-02            Sort(sort0=[$2], dir0=[ASC]) : rowType = RecordType(ANY 
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative 
cost = {4.887341862273601E11 rows, 3.358385867249404E12 cpu, 0.0 io, 
9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44969
01-03              Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = 
RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 
5.1772689218999994E8, cumulative cost = {4.882164593351701E11 rows, 
3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network, 
1.50347889491976E12 memory}, id = 44968
01-04                HashToRandomExchange(dist0=[[$2]]) : rowType = 
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): 
rowcount = 5.1772689218999994E8, cumulative cost = {4.882164593351701E11 rows, 
3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network, 
1.50347889491976E12 memory}, id = 44967
02-01                  UnorderedMuxExchange : rowType = RecordType(ANY server, 
ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
5.1772689218999994E8, cumulative cost = {4.876987324429801E11 rows, 
3.2901543998674497E12 cpu, 0.0 io, 8.48243740164096E13 network, 
1.50347889491976E12 memory}, id = 44966
03-01                    Project(server=[$0], dayId=[$1], maxSize=[$2], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2))]) : rowType = 
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): 
rowcount = 5.1772689218999994E8, cumulative cost = {4.871810055507901E11 rows, 
3.28963667297526E12 cpu, 0.0 io, 8.48243740164096E13 network, 
1.50347889491976E12 memory}, id = 44965
03-02                      HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType 
= RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 
5.1772689218999994E8, cumulative cost = {4.866632786586001E11 rows, 
3.2875657654065E12 cpu, 0.0 io, 8.48243740164096E13 network, 
1.50347889491976E12 memory}, id = 44964
03-03                        Project(server=[$0], dayId=[$1], maxSize=[$2]) : 
rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 
5.1772689219E9, cumulative cost = {4.814860097367001E11 rows, 
3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.3667989953816E12 
memory}, id = 44963
03-04                          HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY 
E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative cost = 
{4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 
network, 1.3667989953816E12 memory}, id = 44962
04-01                            UnorderedMuxExchange : rowType = 
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): 
rowcount = 5.1772689219E9, cumulative cost = {4.7630874081480005E11 rows, 
3.0804750085305E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 
44961
05-01                              Project(server=[$0], dayId=[$1], 
maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, 
hash64AsDouble($0)))]) : rowType = RecordType(ANY server, ANY dayId, ANY 
maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative 
cost = {4.711314718929E11 rows, 3.0752977396086E12 cpu, 0.0 io, 0.0 network, 
1.3667989953816E12 memory}, id = 44960
05-02                                HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) 
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 
5.1772689219E9, cumulative cost = {4.65954202971E11 rows, 3.054588663921E12 
cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44959
05-03                                  Project(server=[$0], dayId=[$1], 
$f2=[LENGTH($2)]) : rowType = RecordType(ANY server, ANY dayId, ANY $f2): 
rowcount = 5.1772689219E10, cumulative cost = {4.14181513752E11 rows, 
1.604953365789E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44958
05-04                                    SelectionVectorRemover : rowType = 
RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, 
cumulative cost = {3.62408824533E11 rows, 1.397862608913E12 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 44957
05-05                                      Filter(condition=[>=($1, 20151001)]) 
: rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 
5.1772689219E10, cumulative cost = {3.10636135314E11 rows, 1.346089919694E12 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44956
05-06                                        Project(dir0=[$0], dir1=[$2], 
ITEM=[ITEM($1, 2)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): 
rowcount = 1.03545378438E11, cumulative cost = {2.07090756876E11 rows, 
7.24817649066E11 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44955
05-07                                          Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/archive/psn, numFiles=116213, columns=[`dir0`, `dir1`, 
`columns`[2]], 
files=[maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.15.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-04.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.30.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-17.15.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-23.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.15.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-14.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-20.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.30.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-10.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-01.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.15.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-21.15.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-11.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-08.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-05.45.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-16.00.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-19.30.sink, 
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.15.sink, ...






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to