[
https://issues.apache.org/jira/browse/IMPALA-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947950#comment-17947950
]
Fang-Yu Rao commented on IMPALA-14008:
--------------------------------------
cc: [~kdeschle], [~amansinha], [~msmith], [~rizaon]
> Consider distributing the results of hash join to multiple Impala hosts when
> only one single Impala host is executing the probe side
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-14008
> URL: https://issues.apache.org/jira/browse/IMPALA-14008
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Fang-Yu Rao
> Priority: Major
>
> Currently, when the probe/left side of a hash join is only executed by one
> single Impala host, the results of the hash join will only be distributed to
> one single host for further processing, e.g., aggregation. The performance
> could be improved if multiple Impala hosts could participate in the
> aggregation in such a case.
>
> For instance, for the query "{{{}select straight_join
> count(functional.tinyinttable.int_col) from functional.tinyinttable,
> functional.alltypes where functional.tinyinttable.int_col =
> functional.alltypes.id{}}}", we have the following distributed query plan
> where there is only one Impala host working on the aggregation, if the
> probe/left side of the join is only executed by one Impala host. Note that we
> added "{{{}straight_join{}}}" to force Impala's frontend to preserve the join
> order.
> {code:java}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
> Peak Mem Est. Peak Mem Detail
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT 1 1 0.000ns 0.000ns
> 0 0
> 06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB FINALIZE
> 05:EXCHANGE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB UNPARTITIONED
> F00:EXCHANGE SENDER 1 1 0.000ns 0.000ns
> 0 48.00 KB
> 03:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 29.00 KB 16.00 KB
> 02:HASH JOIN 1 1 0.000ns 0.000ns 10 5
> 1.98 MB 1.94 MB INNER JOIN, BROADCAST
> |--04:EXCHANGE 1 1 0.000ns 0.000ns 7.30K 7.30K
> 336.00 KB 52.52 KB BROADCAST
> | F01:EXCHANGE SENDER 3 3 0.000ns 0.000ns
> 9.62 KB 32.00 KB
> | 01:SCAN HDFS 3 3 26.669ms 32.003ms 7.30K 7.30K
> 324.00 KB 160.00 MB functional.alltypes
> 00:SCAN HDFS 1 1 4.000ms 4.000ms 10 5
> 29.00 KB 32.00 MB functional.tinyinttable
> {code}
>
> On the other hand, for the same query without the query hint of
> "{{{}straight_join{}}}", there will be multiple Impala hosts executing the
> probe/left side of the hash join in "{{{}select
> count(functional.tinyinttable.int_col) from functional.tinyinttable,
> functional.alltypes where functional.tinyinttable.int_col =
> functional.alltypes.id{}}}" and there will be multiple Impala hosts working
> on the aggregation as shown in the following.
> {code:java}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
> Peak Mem Est. Peak Mem Detail
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT 1 1 0.000ns 0.000ns
> 0 0
> 06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB FINALIZE
> 05:EXCHANGE 1 1 0.000ns 0.000ns 3 3
> 32.00 KB 16.00 KB UNPARTITIONED
> F00:EXCHANGE SENDER 3 3 0.000ns 0.000ns
> 31.00 B 48.00 KB
> 03:AGGREGATE 3 3 0.000ns 0.000ns 3 3
> 64.00 KB 16.00 KB
> 02:HASH JOIN 3 3 1.333ms 4.000ms 10 7.30K
> 1.98 MB 1.94 MB INNER JOIN, BROADCAST
> |--04:EXCHANGE 3 3 0.000ns 0.000ns 10 5
> 16.00 KB 16.00 KB BROADCAST
> | F01:EXCHANGE SENDER 1 1 0.000ns 0.000ns
> 118.00 B 32.00 KB
> | 00:SCAN HDFS 1 1 0.000ns 0.000ns 10 5
> 29.00 KB 32.00 MB functional.tinyinttable
> 01:SCAN HDFS 3 3 6.667ms 8.000ms 7.30K 7.30K
> 342.00 KB 160.00 MB functional.alltypes
> {code}
> The example above is indeed a bit contrived, but it could be a real issue if
> somehow Impala produces a join order according to which there will only be
> one single Impala host executing the probe/left side of a hash join. For
> example, Impala could produce such a join order if there are many files in
> the smaller table (the one with smaller cardinality) of the join and as a
> result this smaller table becomes the probe/left side of a hash join.
>
> For easy reference, we can see in
> [Planner.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java],
> the number of hosts executing the hash join depends on the number of hosts
> executing the left child, i.e., the probe side child.
> {code:java}
> public static void invertJoins(PlanNode root, boolean isLocalPlan) {
> ...
> if (root instanceof JoinNode) {
> // Re-compute the numNodes and numInstances based on the new input order
> joinNode.recomputeNodes();
> }
> }
> {code}
> Recall that {{recomputeNodes()}} is defined in
> [JoinNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/JoinNode.java].
> {code}
> /**
> * Reset the numNodes_ and numInstances_ based on the left child
> */
> public void recomputeNodes() {
> numNodes_ = getChild(0).numNodes_;
> numInstances_ = getChild(0).numInstances_;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]