Nathaniel Auvil created DRILL-3897:
--------------------------------------
Summary: Partitions not being pruned
Key: DRILL-3897
URL: https://issues.apache.org/jira/browse/DRILL-3897
Project: Apache Drill
Issue Type: Bug
Reporter: Nathaniel Auvil
have a two deep partitioning structure. Drill is not pruning partitions
correctly as it reads all files under every directory. My source files are tab
delimited files.
My query:
select dir0 server, dir1 dayId, max(LENGTH(columns[2])) maxSize from
dfs.`/archive/psn` where dir1 >= 20151001 group by dir0,dir1 order by maxSize
plan snippet showing Drill reading uneeded files:
00-00 Screen : rowType = RecordType(ANY server, ANY dayId, ANY maxSize):
rowcount = 5.1772689218999994E8, cumulative cost = {4.898214127009591E11 rows,
3.373451719812133E12 cpu, 0.0 io, 9.966863946928127E13 network,
1.51590434033232E12 memory}, id = 44973
00-01 Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType =
RecordType(ANY server, ANY dayId, ANY maxSize): rowcount =
5.1772689218999994E8, cumulative cost = {4.897696400117401E11 rows,
3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network,
1.51590434033232E12 memory}, id = 44972
00-02 SingleMergeExchange(sort0=[2 ASC]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative
cost = {4.897696400117401E11 rows, 3.3733999471229136E12 cpu, 0.0 io,
9.966863946928127E13 network, 1.51590434033232E12 memory}, id = 44971
01-01 SelectionVectorRemover : rowType = RecordType(ANY server, ANY
dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
{4.892519131195501E11 rows, 3.3589035941415938E12 cpu, 0.0 io,
9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44970
01-02 Sort(sort0=[$2], dir0=[ASC]) : rowType = RecordType(ANY
server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative
cost = {4.887341862273601E11 rows, 3.358385867249404E12 cpu, 0.0 io,
9.330681141805055E13 network, 1.51590434033232E12 memory}, id = 44969
01-03 Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType =
RecordType(ANY server, ANY dayId, ANY maxSize): rowcount =
5.1772689218999994E8, cumulative cost = {4.882164593351701E11 rows,
3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network,
1.50347889491976E12 memory}, id = 44968
01-04 HashToRandomExchange(dist0=[[$2]]) : rowType =
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689218999994E8, cumulative cost = {4.882164593351701E11 rows,
3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network,
1.50347889491976E12 memory}, id = 44967
02-01 UnorderedMuxExchange : rowType = RecordType(ANY server,
ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
5.1772689218999994E8, cumulative cost = {4.876987324429801E11 rows,
3.2901543998674497E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.50347889491976E12 memory}, id = 44966
03-01 Project(server=[$0], dayId=[$1], maxSize=[$2],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2))]) : rowType =
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689218999994E8, cumulative cost = {4.871810055507901E11 rows,
3.28963667297526E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.50347889491976E12 memory}, id = 44965
03-02 HashAgg(group=[{0, 1}], maxSize=[MAX($2)]) : rowType
= RecordType(ANY server, ANY dayId, ANY maxSize): rowcount =
5.1772689218999994E8, cumulative cost = {4.866632786586001E11 rows,
3.2875657654065E12 cpu, 0.0 io, 8.48243740164096E13 network,
1.50347889491976E12 memory}, id = 44964
03-03 Project(server=[$0], dayId=[$1], maxSize=[$2]) :
rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount =
5.1772689219E9, cumulative cost = {4.814860097367001E11 rows,
3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.3667989953816E12
memory}, id = 44963
03-04 HashToRandomExchange(dist0=[[$0]], dist1=[[$1]])
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY
E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative cost =
{4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13
network, 1.3667989953816E12 memory}, id = 44962
04-01 UnorderedMuxExchange : rowType =
RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D):
rowcount = 5.1772689219E9, cumulative cost = {4.7630874081480005E11 rows,
3.0804750085305E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id =
44961
05-01 Project(server=[$0], dayId=[$1],
maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1,
hash64AsDouble($0)))]) : rowType = RecordType(ANY server, ANY dayId, ANY
maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative
cost = {4.711314718929E11 rows, 3.0752977396086E12 cpu, 0.0 io, 0.0 network,
1.3667989953816E12 memory}, id = 44960
05-02 HashAgg(group=[{0, 1}], maxSize=[MAX($2)])
: rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount =
5.1772689219E9, cumulative cost = {4.65954202971E11 rows, 3.054588663921E12
cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44959
05-03 Project(server=[$0], dayId=[$1],
$f2=[LENGTH($2)]) : rowType = RecordType(ANY server, ANY dayId, ANY $f2):
rowcount = 5.1772689219E10, cumulative cost = {4.14181513752E11 rows,
1.604953365789E12 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44958
05-04 SelectionVectorRemover : rowType =
RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10,
cumulative cost = {3.62408824533E11 rows, 1.397862608913E12 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 44957
05-05 Filter(condition=[>=($1, 20151001)])
: rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount =
5.1772689219E10, cumulative cost = {3.10636135314E11 rows, 1.346089919694E12
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44956
05-06 Project(dir0=[$0], dir1=[$2],
ITEM=[ITEM($1, 2)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM):
rowcount = 1.03545378438E11, cumulative cost = {2.07090756876E11 rows,
7.24817649066E11 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44955
05-07 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/archive/psn, numFiles=116213, columns=[`dir0`, `dir1`,
`columns`[2]],
files=[maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-04.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-17.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-23.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-14.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-20.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-10.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-01.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-21.15.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-11.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-08.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-05.45.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-16.00.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-19.30.sink,
maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.15.sink, ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)