kgyrtkirk commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578602640
########## File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out ########## @@ -57,6 +57,7 @@ STAGE PLANS: TableScan alias: src filterExpr: key is not null (type: boolean) + probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582 Review comment: why is `keyRatio` above 1? shouldn't it mean the expected selectivity of the operation? ########## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ########## @@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) { */ public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources, Operator<?> current, Operator<?> terminal) throws SemanticException { - return backtrack(sources, current, terminal, false); + return backtrack(sources, current, terminal, false, false); } public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources, - Operator<?> current, Operator<?> terminal, boolean foldExpr) throws SemanticException { - ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>(); + Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException { Review comment: I think `skipRSParent` is a bit misleading ; you don't want to skip the RS - you want to stay in the same vertex ########## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ########## @@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List<ExprNodeDesc> keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn()); - if (realTSColName != null) { + ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp); + if (originTSColExpr == null) { + LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}", Review comment: current algorithm seems to be: * select best mj candidate * do some further processing - which may bail out bailing out for the best candidate doesn't neccessarily mean that we will still bail out for a less charming candidate - I think it might worth to try to restructure the extra compilation into to for loop - or instead of selecting the best candidate the first part could be implemented as a priority logic just an idea for a followup ########## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ########## @@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, String outputColumnName = cSELOutputColumnNames.get(i); ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i); ExprNodeDesc newPSELExprNodeDesc = - ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true); + ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false); Review comment: instead of modifying every callsite - can we have a method with the original signature? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org