[
https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084309#comment-16084309
]
Jinfeng Ni commented on DRILL-5468:
-----------------------------------
The difference between the plan prior to DRILL-4678 and the plan on latest
master is the join with the IN-clause subquery is switched from broadcast
(prior to DRILL-4678) to distributed join (latest master). The new plan
actually makes more sense, given how Drill's costing estimation works. Prior to
DRILL-4678, it's kind of with luck that Drill ends with a broadcast join plan.
Here is the detailed analysis:
A simplified query to re-produce the problem. :
{code}
select
count(*)
from
orders_hive_int32 o
where
o.o_orderkey in (
select
l_orderkey
from
lineitem_hive_int32
group by
l_orderkey having
sum(l_quantity) > 300
)
{code}
Notice that IN-CLAUSE subquery has a group by "l_orderkey", which is same as
the comparison key for IN-clause. For IN-CLAUSE subquery, Calcite would put
an Aggregate over the comparison key. In this case, the IN-CLAUSE subquery's GB
key is same as the comparison key, ideally it should remove the redundant
Aggregate. Prior-to DRILL-4678, Drill's planner failed to recognize such
redundancy, and keep the redundant aggregate. Here is the {{logical plan}}
prior to DRILL-4678. Notice that there are two aggregates, operating on the
same key, with the top one simply doing duplicate removal.
{code}
DrillAggregateRel(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative
cost = {1.919693144E8 rows, 1.8386736168E9 cpu, 0.0 io, 0.0 network,
1.29332996672E9 memory}, id = 175
DrillProjectRel($f0=[0]): rowcount = 1.5E7, cumulative cost = {1.769693144E8
rows, 1.6586736168E9 cpu, 0.0 io, 0.0 network, 1.16132996672E9 memory}, id = 174
DrillJoinRel(condition=[=($0, $1)], joinType=[inner]): rowcount = 1.5E7,
cumulative cost = {1.619693144E8 rows, 1.5986736168E9 cpu, 0.0 io, 0.0 network,
1.16132996672E9 memory}, id = 173
DrillScanRel(table=[[dfs, tpchsf10, orders]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/drill/testdata/tpch-sf10/orders]],
selectionRoot=file:/drill/testdata/tpch-sf10/orders, numFiles=1,
usedMetadataFile=false, columns=[`o_orderkey`]]]): rowcount = 1.5E7, cumulative
cost = {1.5E7 rows, 1.5E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 166
DrillAggregateRel(group=[{0}]): rowcount = 2999302.6, cumulative cost =
{1.289700118E8 rows, 1.379679196E9 cpu, 0.0 io, 0.0 network, 1.10854224096E9
memory}, id = 172
DrillProjectRel(l_orderkey=[$0]): rowcount = 2999302.6, cumulative cost
= {1.259707092E8 rows, 1.3556847752E9 cpu, 0.0 io, 0.0 network, 1.0557545152E9
memory}, id = 171
DrillFilterRel(condition=[>($1, 300)]): rowcount = 2999302.6,
cumulative cost = {1.259707092E8 rows, 1.3556847752E9 cpu, 0.0 io, 0.0 network,
1.0557545152E9 memory}, id = 170
DrillAggregateRel(group=[{0}], agg#0=[SUM($1)]): rowcount =
5998605.2, cumulative cost = {1.19972104E8 rows, 1.319693144E9 cpu, 0.0 io, 0.0
network, 1.0557545152E9 memory}, id = 169
DrillScanRel(table=[[dfs, tpchsf10, lineitem]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/drill/testdata/tpch-sf10/lineitem]],
selectionRoot=file:/drill/testdata/tpch-sf10/lineitem, numFiles=1,
usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`]]]): rowcount =
5.9986052E7, cumulative cost = {5.9986052E7 rows, 1.19972104E8 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 155
{code}
The redundant Aggregate in theory would incur additional cost. On the other
hand, Drill's cost estimation would reduce the rowCount to 10%, for every
aggregate operator. (Due to the fact Drill does not have and maintain distinct
row count statistics). This 10% rowCount would make planner prefer broadcast,
in stead of distribution-based join.
With DRILL-4678, the new RelMetadataProvider code {{correctly}} detects the two
aggregate operators are applied to the same set of keys. As such, the second
aggregate would be removed. However, that would cause the planner not to take
10% as before, and hence planner would prefer distribution-based join, in stead
of broadcast join.
Here is the {{logical plan}} on latest master:
{code}
DrillAggregateRel(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative
cost = {1.889700118E8 rows, 1.814679196E9 cpu, 0.0 io, 0.0 network,
1.24054224096E9 memory}, id = 279
DrillProjectRel($f0=[0]): rowcount = 1.5E7, cumulative cost = {1.739700118E8
rows, 1.634679196E9 cpu, 0.0 io, 0.0 network, 1.10854224096E9 memory}, id = 278
DrillJoinRel(condition=[=($0, $1)], joinType=[inner]): rowcount = 1.5E7,
cumulative cost = {1.589700118E8 rows, 1.574679196E9 cpu, 0.0 io, 0.0 network,
1.10854224096E9 memory}, id = 277
DrillScanRel(table=[[dfs, tpchsf10, orders]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/drill/testdata/tpch-sf10/orders]],
selectionRoot=file:/drill/testdata/tpch-sf10/orders, numFiles=1,
usedMetadataFile=false, columns=[`o_orderkey`]]]): rowcount = 1.5E7, cumulative
cost = {1.5E7 rows, 1.5E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 271
DrillProjectRel(l_orderkey=[$0]): rowcount = 2999302.6, cumulative cost =
{1.259707092E8 rows, 1.3556847752E9 cpu, 0.0 io, 0.0 network, 1.0557545152E9
memory}, id = 276
DrillFilterRel(condition=[>($1, 300)]): rowcount = 2999302.6,
cumulative cost = {1.259707092E8 rows, 1.3556847752E9 cpu, 0.0 io, 0.0 network,
1.0557545152E9 memory}, id = 275
DrillAggregateRel(group=[{0}], agg#0=[SUM($1)]): rowcount =
5998605.2, cumulative cost = {1.19972104E8 rows, 1.319693144E9 cpu, 0.0 io, 0.0
network, 1.0557545152E9 memory}, id = 274
DrillScanRel(table=[[dfs, tpchsf10, lineitem]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/drill/testdata/tpch-sf10/lineitem]],
selectionRoot=file:/drill/testdata/tpch-sf10/lineitem, numFiles=1,
usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`]]]): rowcount =
5.9986052E7, cumulative cost = {5.9986052E7 rows, 1.19972104E8 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 259
{code}
To summary, the patch DRILL-4678 seems to do the right thing in identifying
redundant aggregation. It's Drill's missing distinct row count statistics
that causes the problem ; Prior to DRILL-4678, Drill just has enough {{lucky}
to get this better plan.
> THCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>
> Key: DRILL-5468
> URL: https://issues.apache.org/jira/browse/DRILL-5468
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.11.0
> Environment: 10+1 node ucs-micro cluster RHEL6.4
> Reporter: Dechang Gu
> Assignee: Jinfeng Ni
> Fix For: 1.11.0
>
> Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
>
>
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on
> SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only
> ~27 secs. The query time on the commit adbf363 which is right before 841ead4
> is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be
> uploaded)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)