kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578602640



##########
File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out
##########
@@ -57,6 +57,7 @@ STAGE PLANS:
                 TableScan
                   alias: src
                   filterExpr: key is not null (type: boolean)
+                  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, 
bigKeyColName:key, smallTablePos:0, keyRatio:1.582

Review comment:
       why is `keyRatio` above 1? shouldn't it mean the expected selectivity of 
the operation?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) {
    */
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
       Operator<?> current, Operator<?> terminal) throws SemanticException {
-    return backtrack(sources, current, terminal, false);
+    return backtrack(sources, current, terminal, false, false);
   }
 
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
-      Operator<?> current, Operator<?> terminal, boolean foldExpr) throws 
SemanticException {
-    ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>();
+      Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean 
skipRSParent) throws SemanticException {

Review comment:
       I think `skipRSParent` is a bit misleading ; you don't want to skip the 
RS - you want to stay in the same vertex

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##########
@@ -1589,13 +1588,17 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
       List<ExprNodeDesc> keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
       ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-      String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, 
keyCol.getColumn());
-      if (realTSColName != null) {
+      ExprNodeColumnDesc originTSColExpr = 
OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp);
+      if (originTSColExpr == null) {
+        LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ 
Schema: {}",

Review comment:
       current algorithm seems to be:
   * select best mj candidate
   * do some further processing - which may bail out
   
   bailing out for the best candidate doesn't neccessarily mean that we will 
still bail out for a less charming candidate - I think it might worth to try to 
restructure the extra compilation into to for loop - or instead of selecting 
the best candidate the first part could be implemented as a priority logic
   
   just an idea for a followup

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
##########
@@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
             String outputColumnName = cSELOutputColumnNames.get(i);
             ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i);
             ExprNodeDesc newPSELExprNodeDesc =
-                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, 
true);
+                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, 
true, false);

Review comment:
       instead of modifying every callsite - can we have a method with the 
original signature?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to