[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sorabh Hamirwasia updated DRILL-7154: ------------------------------------- Labels: ready-to-commit (was: ) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > ----------------------------------------------------------------------------- > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 1.16.0 > Reporter: Robert Hou > Assignee: Hanumath Rao Maduri > Priority: Blocker > Labels: ready-to-commit > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 3750000.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3750000.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02 SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01 SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03 HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3750000.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01 HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636 > 03-03 Project(o_orderkey=[$1], o_orderpriority=[$2], > l_orderkey=[$0]) : rowType = RecordType(ANY o_orderkey, ANY o_orderpriority, > ANY l_orderkey): rowcount = 3.75E8, cumulative cost = {1.8319476940441746E10 > rows, 8.108390595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 > network, 1.5311985057468002E10 memory}, id = 5635 > 03-04 HashJoin(condition=[=($1, $0)], > joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_orderkey, > ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.7944476940441746E10 rows, 7.995890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5634 > 03-05 HashToRandomExchange(dist0=[[$0]]) : rowType > = RecordType(ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, > cumulative cost = {4.125E9 rows, 2.2125E10 cpu, 4.5E9 io, 3.072E12 network, > 0.0 memory}, id = 5633 > 05-01 Project(o_orderkey=[$1], > o_orderpriority=[$2]) : rowType = RecordType(ANY o_orderkey, ANY > o_orderpriority): rowcount = 3.75E8, cumulative cost = {3.75E9 rows, > 1.6125E10 cpu, 4.5E9 io, 0.0 network, 0.0 memory}, id = 5632 > 05-02 SelectionVectorRemover : rowType = > RecordType(ANY o_orderdate, ANY o_orderkey, ANY o_orderpriority): rowcount = > 3.75E8, cumulative cost = {3.375E9 rows, 1.5375E10 cpu, 4.5E9 io, 0.0 > network, 0.0 memory}, id = 5631 > 05-03 Filter(condition=[AND(>=($0, > 1996-10-01), <($0, 1997-01-01 00:00:00))]) : rowType = RecordType(ANY > o_orderdate, ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, > cumulative cost = {3.0E9 rows, 1.5E10 cpu, 4.5E9 io, 0.0 network, 0.0 > memory}, id = 5630 > 05-04 Scan(table=[[dfs, tpchpar1000_micro, > orders]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tpchParquet10/SF1000/orders]], > selectionRoot=maprfs:/tpchParquet10/SF1000/orders, numFiles=1, > numRowGroups=750, usedMetadataFile=false, columns=[`o_orderdate`, > `o_orderkey`, `o_orderpriority`]]]) : rowType = RecordType(ANY o_orderdate, > ANY o_orderkey, ANY o_orderpriority): rowcount = 1.5E9, cumulative cost = > {1.5E9 rows, 4.5E9 cpu, 4.5E9 io, 0.0 network, 0.0 memory}, id = 5629 > 03-06 HashAgg(group=[{0}]) : rowType = > RecordType(ANY l_orderkey): rowcount = 4499992.28175, cumulative cost = > {1.3439976948159996E10 rows, 5.477990604317001E10 cpu, 1.7999969127E10 io, > 1.8431968386048E11 network, 8.711985057468E9 memory}, id = 5628 > 03-07 HashToRandomExchange(dist0=[[$0]]) : > rowType = RecordType(ANY l_orderkey): rowcount = 4.49999228175E7, cumulative > cost = {1.3394977025342497E10 rows, 5.441990666063001E10 cpu, 1.7999969127E10 > io, 1.8431968386048E11 network, 7.919986415880001E9 memory}, id = 5627 > 04-01 HashAgg(group=[{0}]) : rowType = > RecordType(ANY l_orderkey): rowcount = 4.49999228175E7, cumulative cost = > {1.3349977102524998E10 rows, 5.369990789555001E10 cpu, 1.7999969127E10 io, > 0.0 network, 7.919986415880001E9 memory}, id = 5626 > 04-02 Project(l_orderkey=[$0], i=[true]) : > rowType = RecordType(ANY l_orderkey, BOOLEAN i): rowcount = 4.49999228175E8, > cumulative cost = {1.2899977874349998E10 rows, 5.009991407015001E10 cpu, > 1.7999969127E10 io, 0.0 network, 0.0 memory}, id = 5625 > 04-03 SelectionVectorRemover : rowType = > RecordType(ANY l_orderkey, ANY l_commitdate, ANY l_receiptdate): rowcount = > 4.49999228175E8, cumulative cost = {1.2449978646175E10 rows, > 4.784991792927501E10 cpu, 1.7999969127E10 io, 0.0 network, 0.0 memory}, id = > 5624 > 04-04 Filter(condition=[AND(=($0, $0), > <($1, $2))]) : rowType = RecordType(ANY l_orderkey, ANY l_commitdate, ANY > l_receiptdate): rowcount = 4.49999228175E8, cumulative cost = > {1.1999979418E10 rows, 4.7399918701100006E10 cpu, 1.7999969127E10 io, 0.0 > network, 0.0 memory}, id = 5623 > 04-05 Scan(table=[[dfs, > tpchpar1000_micro, lineitem]], groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tpchParquet10/SF1000/lineitem]], > selectionRoot=maprfs:/tpchParquet10/SF1000/lineitem, numFiles=1, > numRowGroups=3250, usedMetadataFile=false, columns=[`l_orderkey`, > `l_commitdate`, `l_receiptdate`]]]) : rowType = RecordType(ANY l_orderkey, > ANY l_commitdate, ANY l_receiptdate): rowcount = 5.999989709E9, cumulative > cost = {5.999989709E9 rows, 1.7999969127E10 cpu, 1.7999969127E10 io, 0.0 > network, 0.0 memory}, id = 5622 > {noformat} > Here is the new plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 3750000.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 11739 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3750000.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 11738 > 00-02 SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 11737 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 11736 > 02-01 SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 11735 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 3750000.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 11734 > 02-03 HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3750000.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 11733 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 11732 > 03-01 HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 11731 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 11730 > 03-03 Project(o_orderkey=[$1], o_orderpriority=[$2], > l_orderkey=[$0]) : rowType = RecordType(ANY o_orderkey, ANY o_orderpriority, > ANY l_orderkey): rowcount = 3.75E8, cumulative cost = {1.8319476940441746E10 > rows, 8.108390595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 > network, 1.5311985057468002E10 memory}, id = 11729 > 03-04 HashJoin(condition=[=($1, $0)], > joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_orderkey, > ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.7944476940441746E10 rows, 7.995890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 11728 > 03-05 HashToRandomExchange(dist0=[[$0]]) : rowType > = RecordType(ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, > cumulative cost = {4.125E9 rows, 2.2125E10 cpu, 4.5E9 io, 3.072E12 network, > 0.0 memory}, id = 11727 > 05-01 Project(o_orderkey=[$1], > o_orderpriority=[$2]) : rowType = RecordType(ANY o_orderkey, ANY > o_orderpriority): rowcount = 3.75E8, cumulative cost = {3.75E9 rows, > 1.6125E10 cpu, 4.5E9 io, 0.0 network, 0.0 memory}, id = 11726 > 05-02 SelectionVectorRemover : rowType = > RecordType(ANY o_orderdate, ANY o_orderkey, ANY o_orderpriority): rowcount = > 3.75E8, cumulative cost = {3.375E9 rows, 1.5375E10 cpu, 4.5E9 io, 0.0 > network, 0.0 memory}, id = 11725 > 05-03 Filter(condition=[AND(>=($0, > 1996-10-01), <($0, 1997-01-01 00:00:00))]) : rowType = RecordType(ANY > o_orderdate, ANY o_orderkey, ANY o_orderpriority): rowcount = 3.75E8, > cumulative cost = {3.0E9 rows, 1.5E10 cpu, 4.5E9 io, 0.0 network, 0.0 > memory}, id = 11724 > 05-04 Scan(table=[[dfs, tpchpar1000_micro, > orders]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tpchParquet10/SF1000/orders]], > selectionRoot=maprfs:/tpchParquet10/SF1000/orders, numFiles=1, > numRowGroups=750, usedMetadataFile=false, columns=[`o_orderdate`, > `o_orderkey`, `o_orderpriority`]]]) : rowType = RecordType(ANY o_orderdate, > ANY o_orderkey, ANY o_orderpriority): rowcount = 1.5E9, cumulative cost = > {1.5E9 rows, 4.5E9 cpu, 4.5E9 io, 0.0 network, 0.0 memory}, id = 11723 > 03-06 HashAgg(group=[{0}]) : rowType = > RecordType(ANY l_orderkey): rowcount = 4499992.28175, cumulative cost = > {1.3439976948159996E10 rows, 5.477990604317001E10 cpu, 1.7999969127E10 io, > 1.8431968386048E11 network, 8.711985057468E9 memory}, id = 11722 > 03-07 HashToRandomExchange(dist0=[[$0]]) : > rowType = RecordType(ANY l_orderkey): rowcount = 4.49999228175E7, cumulative > cost = {1.3394977025342497E10 rows, 5.441990666063001E10 cpu, 1.7999969127E10 > io, 1.8431968386048E11 network, 7.919986415880001E9 memory}, id = 11721 > 04-01 HashAgg(group=[{0}]) : rowType = > RecordType(ANY l_orderkey): rowcount = 4.49999228175E7, cumulative cost = > {1.3349977102524998E10 rows, 5.369990789555001E10 cpu, 1.7999969127E10 io, > 0.0 network, 7.919986415880001E9 memory}, id = 11720 > 04-02 Project(l_orderkey=[$0], i=[true]) : > rowType = RecordType(ANY l_orderkey, BOOLEAN i): rowcount = 4.49999228175E8, > cumulative cost = {1.2899977874349998E10 rows, 5.009991407015001E10 cpu, > 1.7999969127E10 io, 0.0 network, 0.0 memory}, id = 11719 > 04-03 SelectionVectorRemover : rowType = > RecordType(ANY l_orderkey, ANY l_commitdate, ANY l_receiptdate): rowcount = > 4.49999228175E8, cumulative cost = {1.2449978646175E10 rows, > 4.784991792927501E10 cpu, 1.7999969127E10 io, 0.0 network, 0.0 memory}, id = > 11718 > 04-04 Filter(condition=[AND(=($0, $0), > <($1, $2))]) : rowType = RecordType(ANY l_orderkey, ANY l_commitdate, ANY > l_receiptdate): rowcount = 4.49999228175E8, cumulative cost = > {1.1999979418E10 rows, 4.7399918701100006E10 cpu, 1.7999969127E10 io, 0.0 > network, 0.0 memory}, id = 11717 > 04-05 Scan(table=[[dfs, > tpchpar1000_micro, lineitem]], groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tpchParquet10/SF1000/lineitem]], > selectionRoot=maprfs:/tpchParquet10/SF1000/lineitem, numFiles=1, > numRowGroups=3250, usedMetadataFile=false, columns=[`l_orderkey`, > `l_commitdate`, `l_receiptdate`]]]) : rowType = RecordType(ANY l_orderkey, > ANY l_commitdate, ANY l_receiptdate): rowcount = 5.999989709E9, cumulative > cost = {5.999989709E9 rows, 1.7999969127E10 cpu, 1.7999969127E10 io, 0.0 > network, 0.0 memory}, id = 11716 > {noformat} > I have attached two profiles. 235a471b-aa97-bfb5-207d-3f25b4b5fbbb is from > commit id 4627973bde9847a4eb2672c44941136c167326a1. This does not have > Statistics code and serves as the baseline. It is the commit prior to the > Statistics commit. 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5 is from commit id > 5c436dbb028b813e80b7b8fcf045af31f0bcf68b. This has the Statistics code with > the fix for disabled Statistics. > I also pulled the logs from the foreman to show the memory limit for Hash > Agg. With the baseline, HashAgg has a limit of 10GB: > {noformat} > 2019-04-04 00:57:27,446 [235a471b-aa97-bfb5-207d-3f25b4b5fbbb:frag:4:142] > TRACE o.a.d.e.p.i.aggregate.HashAggregator - 1st phase. Estimated internal > row width: 8 Values row width: 0 batch size: 720896 memory limit: > 10000000000 max column width: 8 > {noformat} > With statistics disabled, HashAgg has a limit of 64MB: > {noformat} > 2019-04-04 01:32:48,132 [235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5:frag:4:182] > TRACE o.a.d.e.p.i.aggregate.HashAggregator - 1st phase. Estimated internal > row width: 8 Values row width: 0 batch size: 720896 memory limit: 65075262 > max column width: 8 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)