Re: [PR] HIVE-27078: Bucket Map Join can hang if the source vertex parallelism is changed by reducer autoparallelism [hive]

via GitHub Wed, 26 Mar 2025 08:47:05 -0700


okumin commented on code in PR #5707:
URL: https://github.com/apache/hive/pull/5707#discussion_r2014456892



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java:
##########
@@ -182,6 +187,30 @@ public static ReduceWork createReduceWork(
     return reduceWork;
   }
 
+  private static boolean hasBucketMapJoin(Operator<? extends OperatorDesc> 
operator) {
+    if (operator == null) {
+      return false;
+    }
+
+    // Iterate over child operators
+    for (Operator<? extends OperatorDesc> childOp : 
operator.getChildOperators()) {
+      // Check if this is a MapJoinOperator and is a Bucket Map Join
+      if (childOp instanceof MapJoinOperator) {
+        MapJoinOperator mjOp = (MapJoinOperator) childOp;
+        if (mjOp.getConf().isBucketMapJoin()) {
+          return true; // Found BMJ, no need to check further
+        }
+      }
+
+      // Recursively check children
+      if (hasBucketMapJoin(childOp)) {
+        return true;
+      }

Review Comment:
   Let me understand the root cause correctly. I might be misunderstanding. 
Let's assume the following query.
   
   ```sql
   select s.string_col, count(*)
   from target_table2 t
   inner join (
     select min(date_col) date_col, string_col, decimal_col
     from (
       select date_col, 'pipeline' string_col, min(decimal_col) decimal_col
       from source_table2
       where coalesce(decimal_col,'') = '50000000000000000005905545593'
       group by date_col, string_col
     ) x
     group by string_col, decimal_col
   ) s
   on s.date_col = t.date_col AND s.string_col = t.string_col AND s.decimal_col 
= t.decimal_col
   group by s.string_col;
   ```
   
   Without HIVE-27078, auto-reducer parallelism(the green cells) is enabled on 
all the reducers.
   <img width="345" alt="image" 
src="https://github.com/user-attachments/assets/15c226e1-ccfd-4bf5-ae88-30ba8220f94a";
 />
   
   The original patch tried to turn off `hive.tez.auto.reducer.parallelism`. 
Auto-parallelism would be totally disabled. It would resolve the hang issue but 
unrelated reducers would also be affected.
   <img width="332" alt="image" 
src="https://github.com/user-attachments/assets/6f7937ca-bf31-4355-8f5e-cb89d80a3108";
 />
   
   The current patch will likely make the following DAG. It is better, but 
Reducer 4 will still run without auto-reducer parallelism.
   <img width="342" alt="image" 
src="https://github.com/user-attachments/assets/03226eaa-dbcd-45ed-9a18-ab4e73220482";
 />
   
   In my understanding, the ideal DAG should have this shape.
   <img width="343" alt="image" 
src="https://github.com/user-attachments/assets/0c5c8697-8b97-4cc3-941c-1a8812fda5b4";
 />
   
   To achieve the final shape, we may judge the current ReduceWork should be 
auto-parallelized in the following way, but I don't care how to do. Just an 
example.
   1. Find ReduceSinkOperators directly included in the vertex
   2. Check the direct children of ReduceSinkOperators include BucketMapJoin



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-27078: Bucket Map Join can hang if the source vertex parallelism is changed by reducer autoparallelism [hive]

Reply via email to