[jira] [Commented] (IMPALA-14008) Consider distributing the results of hash join to multiple Impala hosts when only one single Impala host is executing the probe side

Fang-Yu Rao (Jira) Mon, 28 Apr 2025 14:40:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947950#comment-17947950
 ]


Fang-Yu Rao commented on IMPALA-14008:
--------------------------------------

cc: [~kdeschle], [~amansinha], [~msmith], [~rizaon]

> Consider distributing the results of hash join to multiple Impala hosts when 
> only one single Impala host is executing the probe side
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14008
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14008
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Fang-Yu Rao
>            Priority: Major
>
> Currently, when the probe/left side of a hash join is only executed by one 
> single Impala host, the results of the hash join will only be distributed to 
> one single host for further processing, e.g., aggregation. The performance 
> could be improved if multiple Impala hosts could participate in the 
> aggregation in such a case.
>  
> For instance, for the query "{{{}select straight_join 
> count(functional.tinyinttable.int_col) from functional.tinyinttable, 
> functional.alltypes where functional.tinyinttable.int_col = 
> functional.alltypes.id{}}}", we have the following distributed query plan 
> where there is only one Impala host working on the aggregation, if the 
> probe/left side of the join is only executed by one Impala host. Note that we 
> added "{{{}straight_join{}}}" to force Impala's frontend to preserve the join 
> order.
> {code:java}
> Operator                 #Hosts  #Inst  Avg Time  Max Time  #Rows  Est. #Rows 
>   Peak Mem  Est. Peak Mem  Detail                  
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT                      1      1   0.000ns   0.000ns                    
>          0              0                          
> 06:AGGREGATE                  1      1   0.000ns   0.000ns      1           1 
>   16.00 KB       16.00 KB  FINALIZE                
> 05:EXCHANGE                   1      1   0.000ns   0.000ns      1           1 
>   16.00 KB       16.00 KB  UNPARTITIONED           
> F00:EXCHANGE SENDER           1      1   0.000ns   0.000ns                    
>          0       48.00 KB                          
> 03:AGGREGATE                  1      1   0.000ns   0.000ns      1           1 
>   29.00 KB       16.00 KB                          
> 02:HASH JOIN                  1      1   0.000ns   0.000ns     10           5 
>    1.98 MB        1.94 MB  INNER JOIN, BROADCAST   
> |--04:EXCHANGE                1      1   0.000ns   0.000ns  7.30K       7.30K 
>  336.00 KB       52.52 KB  BROADCAST               
> |  F01:EXCHANGE SENDER        3      3   0.000ns   0.000ns                    
>    9.62 KB       32.00 KB                          
> |  01:SCAN HDFS               3      3  26.669ms  32.003ms  7.30K       7.30K 
>  324.00 KB      160.00 MB  functional.alltypes     
> 00:SCAN HDFS                  1      1   4.000ms   4.000ms     10           5 
>   29.00 KB       32.00 MB  functional.tinyinttable 
> {code}
>  
> On the other hand, for the same query without the query hint of 
> "{{{}straight_join{}}}", there will be multiple Impala hosts executing the 
> probe/left side of the hash join in "{{{}select 
> count(functional.tinyinttable.int_col) from functional.tinyinttable, 
> functional.alltypes where functional.tinyinttable.int_col = 
> functional.alltypes.id{}}}" and there will be multiple Impala hosts working 
> on the aggregation as shown in the following.
> {code:java}
> Operator                 #Hosts  #Inst  Avg Time  Max Time  #Rows  Est. #Rows 
>   Peak Mem  Est. Peak Mem  Detail                  
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT                      1      1   0.000ns   0.000ns                    
>          0              0                          
> 06:AGGREGATE                  1      1   0.000ns   0.000ns      1           1 
>   16.00 KB       16.00 KB  FINALIZE                
> 05:EXCHANGE                   1      1   0.000ns   0.000ns      3           3 
>   32.00 KB       16.00 KB  UNPARTITIONED           
> F00:EXCHANGE SENDER           3      3   0.000ns   0.000ns                    
>    31.00 B       48.00 KB                          
> 03:AGGREGATE                  3      3   0.000ns   0.000ns      3           3 
>   64.00 KB       16.00 KB                          
> 02:HASH JOIN                  3      3   1.333ms   4.000ms     10       7.30K 
>    1.98 MB        1.94 MB  INNER JOIN, BROADCAST   
> |--04:EXCHANGE                3      3   0.000ns   0.000ns     10           5 
>   16.00 KB       16.00 KB  BROADCAST               
> |  F01:EXCHANGE SENDER        1      1   0.000ns   0.000ns                    
>   118.00 B       32.00 KB                          
> |  00:SCAN HDFS               1      1   0.000ns   0.000ns     10           5 
>   29.00 KB       32.00 MB  functional.tinyinttable 
> 01:SCAN HDFS                  3      3   6.667ms   8.000ms  7.30K       7.30K 
>  342.00 KB      160.00 MB  functional.alltypes  
> {code}
> The example above is indeed a bit contrived, but it could be a real issue if 
> somehow Impala produces a join order according to which there will only be 
> one single Impala host executing the probe/left side of a hash join. For 
> example, Impala could produce such a join order if there are many files in 
> the smaller table (the one with smaller cardinality) of the join and as a 
> result this smaller table becomes the probe/left side of a hash join.
>  
> For easy reference, we can see in 
> [Planner.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java],
>  the number of hosts executing the hash join depends on the number of hosts 
> executing the left child, i.e., the probe side child.
> {code:java}
>   public static void invertJoins(PlanNode root, boolean isLocalPlan) {
> ...
>     if (root instanceof JoinNode) {
>       // Re-compute the numNodes and numInstances based on the new input order
>       joinNode.recomputeNodes();
>     }
>   }
> {code}
> Recall that {{recomputeNodes()}} is defined in 
> [JoinNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/JoinNode.java].
> {code}
>   /**
>    * Reset the numNodes_ and numInstances_ based on the left child
>    */
>   public void recomputeNodes() {
>     numNodes_ = getChild(0).numNodes_;
>     numInstances_ = getChild(0).numInstances_;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-14008) Consider distributing the results of hash join to multiple Impala hosts when only one single Impala host is executing the probe side

Reply via email to