[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm

GitBox Wed, 30 Mar 2022 19:45:38 -0700


EmmyMiao87 commented on a change in pull request #8695:
URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r839120552




##########
File path: docs/zh-CN/getting-started/advance-usage.md
##########
@@ -207,11 +207,17 @@ mysql> SHOW VARIABLES LIKE "%query_timeout%";
 
 ### 2.3 Broadcast/Shuffle Join
 
-系统默认实现 Join 的方式，是将小表进行条件过滤后，将其广播到大表所在的各个节点上，形成一个内存 Hash 表，然后流式读出大表的数据进行Hash 
Join。但是如果当小表过滤后的数据量无法放入内存的话，此时 Join 将无法完成，通常的报错应该是首先造成内存超限。
+系统提供了两种Join的实现方式，broadcast join和shuffle join（partitioned Join）。
 
-如果遇到上述情况，建议显式指定 Shuffle Join，也被称作 Partitioned Join。即将小表和大表都按照 Join 的 key 进行 
Hash，然后进行分布式的 Join。这个对内存的消耗就会分摊到集群的所有计算节点上。
+broadcast join是指将小表进行条件过滤后，将其广播到大表所在的各个节点上，形成一个内存 Hash 表，然后流式读出大表的数据进行Hash 
Join。
 
-Doris会自动尝试进行 Broadcast Join，如果预估小表过大则会自动切换至 Shuffle Join。注意，如果此时显式指定了 
Broadcast Join 也会自动切换至 Shuffle Join。
+shuffle join是指将小表和大表都按照 Join 的 key 进行 Hash，然后进行分布式的 Join。
+
+当小表的数据量较小时，broadcast join拥有更好的性能。反之，则shuffle join拥有更好的性能。
+
+系统会自动尝试进行 Broadcast 
Join，也可以显式指定每个join算子的实现方式。系统提供了可配置的参数`auto_broadcast_join_threshold`，指定使用broadcast
 join时，hash table使用的内存占整体执行内存比例的上限，取值范围为0到1，默认值为0.8。当系统计算hash 
table使用的内存会超过此限制时，会自动转换为使用shuffle join。注意，此时即使显式指定了 Broadcast Join 也会自动切换至 
Shuffle Join。

Review comment:
       如果用户显著指定 Hint 那是不是还是按照用户指定的走比较好？

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
##########
@@ -97,11 +97,18 @@ public DistributedPlanner(PlannerContext ctx) {
             isPartitioned = true;
         }
         long perNodeMemLimit = ctx_.getQueryOptions().mem_limit;
+        double autoBroadcastJoinThresholdPercentage = 
ctx_.getQueryOptions().getAutoBroadcastJoinThreshold();

Review comment:
       It seams that only hash join node uses the 'autoBroadcastJoinThreshold'.
   So, It is better to hide the parameter acquisition inside the function 
'createHashJoinFragment'

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
##########
@@ -97,11 +97,18 @@ public DistributedPlanner(PlannerContext ctx) {
             isPartitioned = true;
         }
         long perNodeMemLimit = ctx_.getQueryOptions().mem_limit;
+        double autoBroadcastJoinThresholdPercentage = 
ctx_.getQueryOptions().getAutoBroadcastJoinThreshold();
+        if (autoBroadcastJoinThresholdPercentage > 1) {
+            autoBroadcastJoinThresholdPercentage = 1.0;
+        } else if (autoBroadcastJoinThresholdPercentage <= 0) {
+            autoBroadcastJoinThresholdPercentage = -1.0;
+        }
+        long autoBroadcastJoinThreshold = (long)(perNodeMemLimit * 
autoBroadcastJoinThresholdPercentage);
         if (LOG.isDebugEnabled()) {
             LOG.debug("create plan fragments");
-            LOG.debug("memlimit=" + Long.toString(perNodeMemLimit));
+            LOG.debug("auto broadcast threshold = " + 
autoBroadcastJoinThreshold);
         }
-        createPlanFragments(singleNodePlan, isPartitioned, perNodeMemLimit, 
fragments);
+        createPlanFragments(singleNodePlan, isPartitioned, 
autoBroadcastJoinThreshold, fragments);

Review comment:
       Also, the input params 'mem_limit' of function 'createPlanFragments' is 
also not required




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm

Reply via email to