Gautam Parai created DRILL-7245:
-----------------------------------
Summary: TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics
is enabled at sf 100
Key: DRILL-7245
URL: https://issues.apache.org/jira/browse/DRILL-7245
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Gautam Parai
Assignee: Gautam Parai
Fix For: 1.17.0
Here is query 65:
{noformat}
SELECT s_store_name,
i_item_desc,
sc.revenue,
i_current_price,
i_wholesale_cost,
i_brand
FROM store,
item,
(SELECT ss_store_sk,
Avg(revenue) AS ave
FROM (SELECT ss_store_sk,
ss_item_sk,
Sum(ss_sales_price) AS revenue
FROM store_sales,
date_dim
WHERE ss_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1199 AND 1199 + 11
GROUP BY ss_store_sk,
ss_item_sk) sa
GROUP BY ss_store_sk) sb,
(SELECT ss_store_sk,
ss_item_sk,
Sum(ss_sales_price) AS revenue
FROM store_sales,
date_dim
WHERE ss_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1199 AND 1199 + 11
GROUP BY ss_store_sk,
ss_item_sk) sc
WHERE sb.ss_store_sk = sc.ss_store_sk
AND sc.revenue <= 0.1 * sb.ave
AND s_store_sk = sc.ss_store_sk
AND i_item_sk = sc.ss_item_sk
ORDER BY s_store_name,
i_item_desc
LIMIT 100;
{noformat}
Here is the new plan.
{noformat}
00-00 Screen : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY
revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount =
100.0, cumulative cost = {1.170402776556728E9 rows, 9.49229008144672E9 cpu,
2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id
= 45433
00-01 Project(s_store_name=[$0], i_item_desc=[$1], revenue=[$2],
i_current_price=[$3], i_wholesale_cost=[$4], i_brand=[$5]) : rowType =
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price,
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost =
{1.170402766556728E9 rows, 9.49229007144672E9 cpu, 2.305289192E9 io,
5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45432
00-02 SelectionVectorRemover : rowType = RecordType(ANY s_store_name,
ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY
i_brand): rowcount = 100.0, cumulative cost = {1.170402666556728E9 rows,
9.49228947144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network,
1.3320421632282749E8 memory}, id = 45431
00-03 Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY
i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY
i_brand): rowcount = 100.0, cumulative cost = {1.170402566556728E9 rows,
9.49228937144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network,
1.3320421632282749E8 memory}, id = 45430
00-04 SingleMergeExchange(sort0=[0], sort1=[1]) : rowType =
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price,
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost =
{1.170402466556728E9 rows, 9.49228897144672E9 cpu, 2.305289192E9 io,
5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45429
01-01 OrderedMuxExchange(sort0=[0], sort1=[1]) : rowType =
RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price,
ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost =
{1.170402366556728E9 rows, 9.492286842675482E9 cpu, 2.305289192E9 io,
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45428
02-01 SelectionVectorRemover : rowType = RecordType(ANY
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost =
{1.170402266556728E9 rows, 9.492286742675482E9 cpu, 2.305289192E9 io,
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45427
02-02 Limit(fetch=[100]) : rowType = RecordType(ANY
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost =
{1.170402166556728E9 rows, 9.492286642675482E9 cpu, 2.305289192E9 io,
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45426
02-03 SelectionVectorRemover : rowType = RecordType(ANY
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost =
{1.170402066556728E9 rows, 9.492286242675482E9 cpu, 2.305289192E9 io,
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45425
02-04 TopN(limit=[100]) : rowType = RecordType(ANY
s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost =
{1.1699855881208198E9 rows, 9.491869764239574E9 cpu, 2.305289192E9 io,
5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45424
02-05 HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) :
rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY
i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount =
416478.4359081372, cumulative cost = {1.1695691096849117E9 rows,
9.469733581565046E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network,
1.3320421632282749E8 memory}, id = 45423
03-01 Project(s_store_name=[$1], i_item_desc=[$3],
revenue=[$11], i_current_price=[$4], i_wholesale_cost=[$5], i_brand=[$6]) :
rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY
i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount =
416478.4359081372, cumulative cost = {1.1691526312490036E9 rows,
9.464735840334148E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network,
1.3320421632282749E8 memory}, id = 45422
03-02 SelectionVectorRemover : rowType =
RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc,
ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY
ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount =
416478.4359081372, cumulative cost = {1.1687361528130956E9 rows,
9.4622369697187E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network,
1.3320421632282749E8 memory}, id = 45421
03-03 Filter(condition=[<=($11, *(0.1, $8))]) :
rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY
i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY
ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount
= 416478.4359081372, cumulative cost = {1.1683196743771875E9 rows,
9.461820491282793E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network,
1.3320421632282749E8 memory}, id = 45420
03-04 Project(s_store_sk=[$5],
s_store_name=[$6], i_item_sk=[$7], i_item_desc=[$8], i_current_price=[$9],
i_wholesale_cost=[$10], i_brand=[$11], ss_store_sk=[$3], ave=[$4],
ss_store_sk0=[$0], ss_item_sk=[$1], revenue=[$2]) : rowType = RecordType(ANY
s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY
i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave,
ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 832956.8718162744,
cumulative cost = {1.1674867175053713E9 rows, 9.456822750051895E9 cpu,
2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory},
id = 45419
03-05 HashJoin(condition=[=($7, $1)],
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk,
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY
s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 832956.8718162744, cumulative cost =
{1.1666537606335552E9 rows, 9.4468272675901E9 cpu, 2.305289192E9 io,
4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45418
03-07 HashToRandomExchange(dist0=[[$1]]) :
rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY
ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount =
832377.0685035395, cumulative cost = {1.1652093835650516E9 rows,
9.430922742768057E9 cpu, 2.304269192E9 io, 4.544006312385362E10 network,
1.2961381632282749E8 memory}, id = 45415
04-01 HashJoin(condition=[=($5, $0)],
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk,
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY
s_store_name): rowcount = 832377.0685035395, cumulative cost =
{1.164377006496548E9 rows, 9.417604709672E9 cpu, 2.304269192E9 io,
2.157414781572014E10 network, 1.2961381632282749E8 memory}, id = 45414
04-03 HashJoin(condition=[=($3, $0)],
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk,
ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave): rowcount =
832377.0685035397, cumulative cost = {1.1635434234280443E9 rows,
9.407608948849958E9 cpu, 2.304268388E9 io, 2.1544179841320137E10 network,
1.2960674112282749E8 memory}, id = 45411
04-05 HashAgg(group=[{0, 1}],
revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY
revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8
rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network,
5.674418293829948E7 memory}, id = 45393
04-07
HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397,
cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu,
1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory},
id = 45392
06-01 HashAgg(group=[{0, 1}],
revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY
revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8
rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network,
3.476942832980602E7 memory}, id = 45391
06-02 Project(ss_store_sk=[$2],
ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk,
ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative
cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io,
2.48987648E7 network, 5878.400000000001 memory}, id = 45390
06-03
HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) :
rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY
ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734,
cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io,
2.48987648E7 network, 5878.400000000001 memory}, id = 45389
06-05 Scan(table=[[dfs,
/tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1,
numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`,
`ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY
ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount
= 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu,
1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45384
06-04 BroadcastExchange :
rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0,
cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7
network, 0.0 memory}, id = 45388
09-01
SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq):
rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io,
0.0 network, 0.0 memory}, id = 45387
09-02
Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY
d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0
rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45386
09-03
Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1,
numRowGroups=1, usedMetadataFile=false,
filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) ,
less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`,
`d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq):
rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io,
0.0 network, 0.0 memory}, id = 45385
04-04 Project(ss_store_sk0=[$0],
ave=[$1]) : rowType = RecordType(ANY ss_store_sk0, ANY ave): rowcount = 202.0,
cumulative cost = {5.822713399551244E8 rows, 4.714959034142926E9 cpu,
1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id
= 45410
04-06 BroadcastExchange : rowType =
RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost =
{5.822711379551244E8 rows, 4.714958630142926E9 cpu, 1.152134194E9 io,
1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45409
07-01 Project(ss_store_sk=[$0],
ave=[divide(CastHigh(CASE(=($2, 0), null, $1)), $2)]) : rowType =
RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost =
{5.822709359551244E8 rows, 4.714957014142926E9 cpu, 1.152134194E9 io,
1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45408
07-02 HashAgg(group=[{0}],
agg#0=[$SUM0($1)], agg#1=[$SUM0($2)]) : rowType = RecordType(ANY ss_store_sk,
ANY $f1, BIGINT $f2): rowcount = 202.0, cumulative cost = {5.822707339551244E8
rows, 4.714956004142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network,
7.2859002984528E7 memory}, id = 45407
07-03
HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY ss_store_sk, ANY
$f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost =
{5.82187496248274E8 rows, 4.712292397523715E9 cpu, 1.152134194E9 io,
1.1275973124348644E10 network, 7.139401934396178E7 memory}, id = 45406
10-01 HashAgg(group=[{0}],
agg#0=[$SUM0($1)], agg#1=[COUNT($1)]) : rowType = RecordType(ANY ss_store_sk,
ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost =
{5.821042585414236E8 rows, 4.710960594214109E9 cpu, 1.152134194E9 io,
1.0253148182571495E10 network, 7.139401934396178E7 memory}, id = 45405
10-02
Project(ss_store_sk=[$0], revenue=[$2]) : rowType = RecordType(ANY ss_store_sk,
ANY revenue): rowcount = 832377.0685035397, cumulative cost =
{5.812718814729201E8 rows, 4.6843245280219965E9 cpu, 1.152134194E9 io,
1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45404
10-03 HashAgg(group=[{0,
1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk,
ANY revenue): rowcount = 832377.0685035397, cumulative cost =
{5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io,
1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45403
10-04
HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397,
cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu,
1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory},
id = 45402
11-01
HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY
ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397,
cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu,
1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45401
11-02
Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType =
RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount =
1316801.1337047734, cumulative cost = {5.774579491337048E8 rows,
4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network,
5878.400000000001 memory}, id = 45400
11-03
HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) :
rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY
ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734,
cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io,
2.48987648E7 network, 5878.400000000001 memory}, id = 45399
11-05
Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tpcdsParquet10/SF100/store_sales]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1,
numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`,
`ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY
ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount
= 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu,
1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45394
11-04
BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq):
rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io,
2.48987648E7 network, 0.0 memory}, id = 45398
12-01
SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq):
rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io,
0.0 network, 0.0 memory}, id = 45397
12-02
Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY
d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0
rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45396
12-03
Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1,
numRowGroups=1, usedMetadataFile=false,
filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) ,
less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`,
`d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq):
rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io,
0.0 network, 0.0 memory}, id = 45395
04-02 BroadcastExchange : rowType =
RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost
= {804.0 rows, 4020.0 cpu, 804.0 io, 2.99679744E7 network, 0.0 memory}, id =
45413
08-01 Scan(table=[[dfs,
/tpcdsParquet10/SF100/store]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/store, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`s_store_sk`, `s_store_name`]]]) : rowType =
RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost
= {402.0 rows, 804.0 cpu, 804.0 io, 0.0 network, 0.0 memory}, id = 45412
03-06 HashToRandomExchange(dist0=[[$0]]) :
rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY
i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost =
{408000.0 rows, 4284000.0 cpu, 1020000.0 io, 4.17792E9 network, 0.0 memory}, id
= 45417
05-01 Scan(table=[[dfs,
/tpcdsParquet10/SF100/item]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/item]],
selectionRoot=maprfs:/tpcdsParquet10/SF100/item, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`i_item_sk`, `i_item_desc`, `i_current_price`,
`i_wholesale_cost`, `i_brand`]]]) : rowType = RecordType(ANY i_item_sk, ANY
i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount
= 204000.0, cumulative cost = {204000.0 rows, 1020000.0 cpu, 1020000.0 io, 0.0
network, 0.0 memory}, id = 45416
{noformat}
For query 65, there are two HashAggs (operator 04-05 and 10-03) that are taking
a long time. Their rowcounts are incorrect. Operator 04-05 is estimated to be
800K and is actually 55M. Operator 10-03 is estimated to be 800K and is also
55M. As a result, these two operators only have 9 minor fragments in the new
plan. This may be caused by the HashJoin rowcount for operator (06-03). The
estimated rowcount for the HashJoin is 1.3M rows but the actual rowcount is 55M
rows.
For query 1, hash join operator 03-04 may have a similar problem.
For query 45, hash join operator 16-02 may have a similar problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)