[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543613#comment-16543613
 ] 

Khurram Faraaz commented on DRILL-6453:
---------------------------------------

Results of executing the simplified query with the first three joins starting 
from the leaf level in the plan, of TPC-DS query 72.
Total time taken for below query to complete was 07 min 46.719 sec

{noformat}
SELECT
 Count(*) total_cnt 
FROM catalog_sales 
JOIN inventory 
 ON ( cs_item_sk = inv_item_sk ) 
JOIN customer_demographics 
 ON ( cs_bill_cdemo_sk = cd_demo_sk ) 
JOIN household_demographics 
 ON ( cs_bill_hdemo_sk = hd_demo_sk ) 
WHERE inv_quantity_on_hand < cs_quantity 
 AND hd_buy_potential = '501-1000' 
 AND cd_marital_status = 'M' 
LIMIT 100
{noformat}

{noformat}
00-00 Screen : rowType = RecordType(BIGINT total_cnt): rowcount = 100.0, 
cumulative cost = \{9.7136055E7 rows, 6.08208382E8 cpu, 0.0 io, 9.4473289728E10 
network, 3.04611648E7 memory}, id = 2694
00-01 Project(total_cnt=[$0]) : rowType = RecordType(BIGINT total_cnt): 
rowcount = 100.0, cumulative cost = \{9.7136045E7 rows, 6.08208372E8 cpu, 0.0 
io, 9.4473289728E10 network, 3.04611648E7 memory}, id = 2693
00-02 SelectionVectorRemover : rowType = RecordType(BIGINT total_cnt): rowcount 
= 100.0, cumulative cost = \{9.7135945E7 rows, 6.08208272E8 cpu, 0.0 io, 
9.4473289728E10 network, 3.04611648E7 memory}, id = 2692
00-03 Limit(fetch=[100]) : rowType = RecordType(BIGINT total_cnt): rowcount = 
100.0, cumulative cost = \{9.7135845E7 rows, 6.08208172E8 cpu, 0.0 io, 
9.4473289728E10 network, 3.04611648E7 memory}, id = 2691
00-04 StreamAgg(group=[{}], total_cnt=[$SUM0($0)]) : rowType = 
RecordType(BIGINT total_cnt): rowcount = 1.0, cumulative cost = \{9.7135745E7 
rows, 6.08207772E8 cpu, 0.0 io, 9.4473289728E10 network, 3.04611648E7 memory}, 
id = 2690
00-05 StreamAgg(group=[{}], total_cnt=[COUNT()]) : rowType = RecordType(BIGINT 
total_cnt): rowcount = 1.0, cumulative cost = \{9.7135744E7 rows, 6.0820776E8 
cpu, 0.0 io, 9.4473289728E10 network, 3.04611648E7 memory}, id = 2689
00-06 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 
5872500.0, cumulative cost = \{9.1263244E7 rows, 5.3773776E8 cpu, 0.0 io, 
9.4473289728E10 network, 3.04611648E7 memory}, id = 2688
00-07 HashJoin(condition=[=($0, $1)], joinType=[inner]) : rowType = 
RecordType(ANY cs_bill_hdemo_sk, ANY hd_demo_sk): rowcount = 5872500.0, 
cumulative cost = \{8.5390744E7 rows, 5.1424776E8 cpu, 0.0 io, 9.4473289728E10 
network, 3.04611648E7 memory}, id = 2687
00-09 Project(cs_bill_hdemo_sk=[$1]) : rowType = RecordType(ANY 
cs_bill_hdemo_sk): rowcount = 5872500.0, cumulative cost = \{7.9500604E7 rows, 
4.4371944E8 cpu, 0.0 io, 9.4473289728E10 network, 3.04421568E7 memory}, id = 
2682
00-11 HashJoin(condition=[=($0, $2)], joinType=[inner]) : rowType = 
RecordType(ANY cs_bill_cdemo_sk, ANY cs_bill_hdemo_sk, ANY cd_demo_sk): 
rowcount = 5872500.0, cumulative cost = \{7.3628104E7 rows, 4.3784694E8 cpu, 
0.0 io, 9.4473289728E10 network, 3.04421568E7 memory}, id = 2681
00-14 Project(cs_bill_cdemo_sk=[$0], cs_bill_hdemo_sk=[$1]) : rowType = 
RecordType(ANY cs_bill_cdemo_sk, ANY cs_bill_hdemo_sk): rowcount = 5872500.0, 
cumulative cost = \{6.3049644E7 rows, 3.5181846E8 cpu, 0.0 io, 9.4473289728E10 
network, 2.53712448E7 memory}, id = 2676
00-17 SelectionVectorRemover : rowType = RecordType(ANY cs_bill_cdemo_sk, ANY 
cs_bill_hdemo_sk, ANY cs_item_sk, ANY cs_quantity, ANY inv_item_sk, ANY 
inv_quantity_on_hand): rowcount = 5872500.0, cumulative cost = \{5.7177144E7 
rows, 3.4007346E8 cpu, 0.0 io, 9.4473289728E10 network, 2.53712448E7 memory}, 
id = 2675
00-19 Filter(condition=[<($5, $3)]) : rowType = RecordType(ANY 
cs_bill_cdemo_sk, ANY cs_bill_hdemo_sk, ANY cs_item_sk, ANY cs_quantity, ANY 
inv_item_sk, ANY inv_quantity_on_hand): rowcount = 5872500.0, cumulative cost = 
\{5.1304644E7 rows, 3.3420096E8 cpu, 0.0 io, 9.4473289728E10 network, 
2.53712448E7 memory}, id = 2674
00-21 Project(cs_bill_cdemo_sk=[$2], cs_bill_hdemo_sk=[$3], cs_item_sk=[$4], 
cs_quantity=[$5], inv_item_sk=[$0], inv_quantity_on_hand=[$1]) : rowType = 
RecordType(ANY cs_bill_cdemo_sk, ANY cs_bill_hdemo_sk, ANY cs_item_sk, ANY 
cs_quantity, ANY inv_item_sk, ANY inv_quantity_on_hand): rowcount = 1.1745E7, 
cumulative cost = \{3.9559644E7 rows, 2.6373096E8 cpu, 0.0 io, 9.4473289728E10 
network, 2.53712448E7 memory}, id = 2673
00-22 HashJoin(condition=[=($4, $0)], joinType=[inner]) : rowType = 
RecordType(ANY inv_item_sk, ANY inv_quantity_on_hand, ANY cs_bill_cdemo_sk, ANY 
cs_bill_hdemo_sk, ANY cs_item_sk, ANY cs_quantity): rowcount = 1.1745E7, 
cumulative cost = \{2.7814644E7 rows, 1.9326096E8 cpu, 0.0 io, 9.4473289728E10 
network, 2.53712448E7 memory}, id = 2672
00-24 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/tpcds_sf1/parquet/inventory]], 
selectionRoot=/drill/testdata/tpcds_sf1/parquet/inventory, numFiles=1, 
numRowGroups=1, usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/tpcds_sf1/parquet/inventory, 
columns=[`inv_item_sk`, `inv_quantity_on_hand`]]]) : rowType = RecordType(ANY 
inv_item_sk, ANY inv_quantity_on_hand): rowcount = 1.1745E7, cumulative cost = 
\{1.1745E7 rows, 2.349E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2669
00-23 BroadcastExchange : rowType = RecordType(ANY cs_bill_cdemo_sk, ANY 
cs_bill_hdemo_sk, ANY cs_item_sk, ANY cs_quantity): rowcount = 1441548.0, 
cumulative cost = \{2883096.0 rows, 1.7298576E7 cpu, 0.0 io, 9.4473289728E10 
network, 0.0 memory}, id = 2671
01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/tpcds_sf1/parquet/catalog_sales]], 
selectionRoot=/drill/testdata/tpcds_sf1/parquet/catalog_sales, numFiles=1, 
numRowGroups=2, usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/tpcds_sf1/parquet/catalog_sales, 
columns=[`cs_bill_cdemo_sk`, `cs_bill_hdemo_sk`, `cs_item_sk`, 
`cs_quantity`]]]) : rowType = RecordType(ANY cs_bill_cdemo_sk, ANY 
cs_bill_hdemo_sk, ANY cs_item_sk, ANY cs_quantity): rowcount = 1441548.0, 
cumulative cost = \{1441548.0 rows, 5766192.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 2670
00-13 Project(cd_demo_sk=[$0]) : rowType = RecordType(ANY cd_demo_sk): rowcount 
= 288120.0, cumulative cost = \{4417840.0 rows, 1.325352E7 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 2680
00-16 SelectionVectorRemover : rowType = RecordType(ANY cd_demo_sk, ANY 
cd_marital_status): rowcount = 288120.0, cumulative cost = \{4129720.0 rows, 
1.29654E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2679
00-18 Filter(condition=[=($1, 'M')]) : rowType = RecordType(ANY cd_demo_sk, ANY 
cd_marital_status): rowcount = 288120.0, cumulative cost = \{3841600.0 rows, 
1.267728E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2678
00-20 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/tpcds_sf1/parquet/customer_demographics]], 
selectionRoot=maprfs:/drill/testdata/tpcds_sf1/parquet/customer_demographics, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`cd_demo_sk`, 
`cd_marital_status`]]]) : rowType = RecordType(ANY cd_demo_sk, ANY 
cd_marital_status): rowcount = 1920800.0, cumulative cost = \{1920800.0 rows, 
3841600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2677
00-08 Project(hd_demo_sk=[$0]) : rowType = RecordType(ANY hd_demo_sk): rowcount 
= 1080.0, cumulative cost = \{16560.0 rows, 49680.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 2686
00-10 SelectionVectorRemover : rowType = RecordType(ANY hd_demo_sk, ANY 
hd_buy_potential): rowcount = 1080.0, cumulative cost = \{15480.0 rows, 48600.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2685
00-12 Filter(condition=[=($1, '501-1000')]) : rowType = RecordType(ANY 
hd_demo_sk, ANY hd_buy_potential): rowcount = 1080.0, cumulative cost = 
\{14400.0 rows, 47520.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2684
00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/tpcds_sf1/parquet/household_demographics]], 
selectionRoot=/drill/testdata/tpcds_sf1/parquet/household_demographics, 
numFiles=1, numRowGroups=1, usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/tpcds_sf1/parquet/household_demographics, 
columns=[`hd_demo_sk`, `hd_buy_potential`]]]) : rowType = RecordType(ANY 
hd_demo_sk, ANY hd_buy_potential): rowcount = 7200.0, cumulative cost = 
\{7200.0 rows, 14400.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2683

{noformat}

> TPC-DS query 72 has regressed
> -----------------------------
>
>                 Key: DRILL-6453
>                 URL: https://issues.apache.org/jira/browse/DRILL-6453
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.14.0
>            Reporter: Khurram Faraaz
>            Assignee: Boaz Ben-Zvi
>            Priority: Blocker
>             Fix For: 1.14.0
>
>         Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to