ngsg commented on code in PR #5717: URL: https://github.com/apache/hive/pull/5717#discussion_r2071722757
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java: ########## @@ -1427,12 +1426,29 @@ private static SharedResult extractSharedOptimizationInfoForRoot(ParseContext pc if (equalOp1.getNumChild() > 1 || equalOp2.getNumChild() > 1) { // TODO: Support checking multiple child operators to merge further. discardableInputOps.addAll(gatherDPPBranchOps(pctx, optimizerCache, discardableOps)); - return new SharedResult(retainableOps, discardableOps, discardableInputOps, - dataSize, maxDataSize); + + // Accumulate InMemoryDataSize of unmerged MapJoin operators. + Set<Operator<?>> opsWork1 = findWorkOperators(optimizerCache, retainableTsOp); + for (Operator<?> op : opsWork1) { + if (op instanceof MapJoinOperator) { + MapJoinOperator mop = (MapJoinOperator) op; + dataSize = StatsUtils.safeAdd(dataSize, mop.getConf().getInMemoryDataSize()); + maxDataSize = mop.getConf().getMemoryMonitorInfo().getAdjustedNoConditionalTaskSize(); + } + } + Set<Operator<?>> opsWork2 = findWorkOperators(optimizerCache, discardableTsOp); Review Comment: At this point, we detect that the two TableScan operators are identical (i.e., they can be merged). However, at least one of them has multiple child operators, preventing us from merging their descendant operators. As a result, the total memory consumed by the MapJoin operator in the merged vertex should account for both descendant MapJoin operators of the retainable and discardable TableScan operators. For example, if we merge TS1-{Join1, Join2} and TS2-{Join3, Join4}, the merged operator graph would be TS1-{Join1, Join2, Join3, Join4}, and we need to consider Join3 and Join4, which come from the discarded TableScan operator TS2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org