[GitHub] [hive] amansinha100 commented on a diff in pull request #3504: add missing duplicates of join keys to RS schema

GitBox Thu, 04 Aug 2022 20:06:49 -0700


amansinha100 commented on code in PR #3504:
URL: https://github.com/apache/hive/pull/3504#discussion_r938400054



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -9436,24 +9436,21 @@ private Operator genJoinReduceSinkChild(ExprNodeDesc[] 
joinKeys,
 
       // backtrack can be null when input is script operator
       ExprNodeDesc exprBack = ExprNodeDescUtils.backtrack(expr, dummy, parent);
-      int kindex;
-      if (exprBack == null) {
-        kindex = -1;
-      } else if (ExprNodeDescUtils.isConstant(exprBack)) {
-        kindex = reduceKeysBack.indexOf(exprBack);
-      } else {
-        kindex = ExprNodeDescUtils.indexOf(exprBack, reduceKeysBack);
-      }
-      if (kindex >= 0) {
-        ColumnInfo newColInfo = new ColumnInfo(colInfo);
-        String internalColName = Utilities.ReduceField.KEY + ".reducesinkkey" 
+ kindex;
-        newColInfo.setInternalName(internalColName);
-        newColInfo.setTabAlias(nm[0]);
-        outputRR.put(nm[0], nm[1], newColInfo);
-        if (nm2 != null) {
-          outputRR.addMappingOnly(nm2[0], nm2[1], newColInfo);
+      if (exprBack != null) {
+        if (ExprNodeDescUtils.isConstant(exprBack)) {
+          int kindex = reduceKeysBack.indexOf(exprBack);
+          addJoinKeyToRowScema(outputRR, index, i, colInfo, nm, nm2, kindex);
+        } else {
+          int startIdx = 0;
+          int kindex;
+          // joinKey may present multiple times, add the duplicates to the 
schema with different internal name
+          //      join        LU_CUSTOMER        a16
+          //      on         (a15.CUSTOMER_ID = a16.CUSTOMER_ID and 
pa11.CUSTOMER_ID = a16.CUSTOMER_ID)

Review Comment:
   For clarity, could you pls add the internal name that would be produced for 
the 2 occurrences of a16.CUSTOMER_ID ? 
   Also, the duplicate occurrences could be in a WHERE clause instead of ON 
clause ..can we verify if that case also works.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -9528,6 +9525,19 @@ private Operator genJoinReduceSinkChild(ExprNodeDesc[] 
joinKeys,
     return rsOp;
   }
 
+  private void addJoinKeyToRowScema(

Review Comment:
   nit: spelling: RowSchema (missing  'h')



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -9436,24 +9436,21 @@ private Operator genJoinReduceSinkChild(ExprNodeDesc[] 
joinKeys,
 
       // backtrack can be null when input is script operator
       ExprNodeDesc exprBack = ExprNodeDescUtils.backtrack(expr, dummy, parent);
-      int kindex;
-      if (exprBack == null) {
-        kindex = -1;
-      } else if (ExprNodeDescUtils.isConstant(exprBack)) {
-        kindex = reduceKeysBack.indexOf(exprBack);
-      } else {
-        kindex = ExprNodeDescUtils.indexOf(exprBack, reduceKeysBack);
-      }
-      if (kindex >= 0) {
-        ColumnInfo newColInfo = new ColumnInfo(colInfo);
-        String internalColName = Utilities.ReduceField.KEY + ".reducesinkkey" 
+ kindex;
-        newColInfo.setInternalName(internalColName);
-        newColInfo.setTabAlias(nm[0]);
-        outputRR.put(nm[0], nm[1], newColInfo);
-        if (nm2 != null) {
-          outputRR.addMappingOnly(nm2[0], nm2[1], newColInfo);
+      if (exprBack != null) {
+        if (ExprNodeDescUtils.isConstant(exprBack)) {
+          int kindex = reduceKeysBack.indexOf(exprBack);
+          addJoinKeyToRowScema(outputRR, index, i, colInfo, nm, nm2, kindex);

Review Comment:
   Previously we were checking if kindex >= 0 before adding to the output row 
schema.  We should check here as well. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] amansinha100 commented on a diff in pull request #3504: add missing duplicates of join keys to RS schema

Reply via email to