ngsg commented on code in PR #5717:
URL: https://github.com/apache/hive/pull/5717#discussion_r2071722757


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java:
##########
@@ -1427,12 +1426,29 @@ private static SharedResult 
extractSharedOptimizationInfoForRoot(ParseContext pc
     if (equalOp1.getNumChild() > 1 || equalOp2.getNumChild() > 1) {
       // TODO: Support checking multiple child operators to merge further.
       discardableInputOps.addAll(gatherDPPBranchOps(pctx, optimizerCache, 
discardableOps));
-      return new SharedResult(retainableOps, discardableOps, 
discardableInputOps,
-          dataSize, maxDataSize);
+
+      // Accumulate InMemoryDataSize of unmerged MapJoin operators.
+      Set<Operator<?>> opsWork1 = findWorkOperators(optimizerCache, 
retainableTsOp);
+      for (Operator<?> op : opsWork1) {
+        if (op instanceof MapJoinOperator) {
+          MapJoinOperator mop = (MapJoinOperator) op;
+          dataSize = StatsUtils.safeAdd(dataSize, 
mop.getConf().getInMemoryDataSize());
+          maxDataSize = 
mop.getConf().getMemoryMonitorInfo().getAdjustedNoConditionalTaskSize();
+        }
+      }
+      Set<Operator<?>> opsWork2 = findWorkOperators(optimizerCache, 
discardableTsOp);

Review Comment:
   At this point, we detect that the two TableScan operators are identical 
(i.e., they can be merged). However, at least one of them has multiple child 
operators, preventing us from merging their descendant operators. As a result, 
the total memory consumed by the MapJoin operator in the merged vertex should 
account for both descendant MapJoin operators of the retainable and discardable 
TableScan operators.
   
   For example, if we merge TS1-{Join1, Join2} and TS2-{Join3, Join4}, the 
merged operator graph would be TS1-{Join1, Join2, Join3, Join4}, and we need to 
consider Join3 and Join4, which come from the discarded TableScan operator TS2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to