zabetak commented on code in PR #6495:
URL: https://github.com/apache/hive/pull/6495#discussion_r3266254973


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -577,6 +570,37 @@ public static List<RexNode> 
transformInToOrOperands(List<RexNode> operands, RexB
     return disjuncts;
   }
 
+  /**
+   * This method tries to rewrite IN expression arguments into an equivalent 
call.
+   * If there are only two elements, generates an EQUALS:
+   * IN [A,B] => EQUALS [A,B]
+   * Otherwise, tries to generate a SEARCH:
+   * IN [A,B,C] => SEARCH(A, SARG([B..B], [C..C]))
+   * If this is not possible (e.g., argument types not sufficiently compatible 
to generate a Calcite SEARCH expression),
+   * tries to generate an OR expression:
+   * IN [A,B,C] => OR [EQUALS [A,B], EQUALS [A,C]]
+   * If this is not possible (e.g., non-deterministic calls are found in the 
expressions), returns null
+   */
+  public static RexNode rewriteInClause(List<RexNode> childRexNodeLst, 
RexBuilder rexBuilder) {
+    if (childRexNodeLst.size() == 2) {
+      return rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, childRexNodeLst);
+    }

Review Comment:
   Should this become part of `RexBuilder#makeIn` API if not already the case?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -267,27 +267,19 @@ public RexNode getExpression(String functionText, 
FunctionInfo fi,
         // If it is a floor <date> operator, we need to rewrite it
         inputs = RexNodeConverter.rewriteFloorDateChildren(calciteOp, inputs, 
rexBuilder);
       } else if (HiveIn.INSTANCE.equals(calciteOp)) {
-        // if it is a single item in an IN clause, transform A IN (B) to A = B
-        // from IN [A,B] => EQUALS [A,B]
-        // if it is more than an single item in an IN clause,
-        // transform from IN [A,B,C] => OR [EQUALS [A,B], EQUALS [A,C]]
-        // Rewrite to OR is done only if number of operands are less than
-        // the threshold configured
-        boolean rewriteToOr = true;
-        if(maxNodesForInToOrTransformation != 0) {
-          if(inputs.size() > maxNodesForInToOrTransformation) {
-            rewriteToOr = false;
-          }
-        }
-        if(rewriteToOr) {
-          // If there are non-deterministic functions, we cannot perform this 
rewriting
-          List<RexNode> newInputs = 
RexNodeConverter.transformInToOrOperands(inputs, rexBuilder);
-          if (newInputs != null) {
-            inputs = newInputs;
-            if (inputs.size() == 1) {
-              inputs.add(rexBuilder.makeLiteral(false));
+        RexNode rewritten = RexNodeConverter.rewriteInClause(inputs, 
rexBuilder);
+        if (rewritten != null) {
+          assert rewritten instanceof RexCall;
+          RexCall call = (RexCall) rewritten;
+          if (call.getKind() == SqlKind.OR && maxNodesForInToOrTransformation 
!= 0) {
+            // Rewrite to OR is done only if number of operands are less than 
the threshold configured
+            if (call.getOperands().size() <= maxNodesForInToOrTransformation) {

Review Comment:
   It's strange to have threshold blockers after performing the transformation. 
I think the initial purpose of the threshold was to prevent a performance hit 
and the exponential explosion of disjunctions generated by the transformation.
   
   As part of this PR we should determine what happens with 
`hive.optimize.transform.in.maxnodes` property. Is it still relevant/necessary 
or are we planning to deprecate and remove it eventually?
   



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -259,19 +259,12 @@ private RexNode convert(ExprNodeGenericFuncDesc func) 
throws SemanticException {
         // If it is a floor <date> operator, we need to rewrite it
         childRexNodeLst = rewriteFloorDateChildren(calciteOp, childRexNodeLst, 
rexBuilder);
       } else if (HiveIn.INSTANCE.equals(calciteOp) && isAllPrimitive) {
-        if (childRexNodeLst.size() == 2) {
-          // if it is a single item in an IN clause, transform A IN (B) to A = 
B
-          // from IN [A,B] => EQUALS [A,B]
-          // except complex types
-          calciteOp = SqlStdOperatorTable.EQUALS;
-        } else if (RexUtil.isReferenceOrAccess(childRexNodeLst.get(0), true)){
-          // if it is more than an single item in an IN clause,
-          // transform from IN [A,B,C] => OR [EQUALS [A,B], EQUALS [A,C]]
-          // except complex types
-          // Rewrite to OR is done only if number of operands are less than
-          // the threshold configured
-          childRexNodeLst = rewriteInClauseChildren(calciteOp, 
childRexNodeLst, rexBuilder);
-          calciteOp = SqlStdOperatorTable.OR;
+        if (childRexNodeLst.size() == 2 || 
RexUtil.isReferenceOrAccess(childRexNodeLst.get(0), true)) {

Review Comment:
   Why do we limit the transformation? What are the pros/cons of doing an 
unconditional rewrite?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -577,6 +570,37 @@ public static List<RexNode> 
transformInToOrOperands(List<RexNode> operands, RexB
     return disjuncts;
   }
 
+  /**
+   * This method tries to rewrite IN expression arguments into an equivalent 
call.
+   * If there are only two elements, generates an EQUALS:
+   * IN [A,B] => EQUALS [A,B]
+   * Otherwise, tries to generate a SEARCH:
+   * IN [A,B,C] => SEARCH(A, SARG([B..B], [C..C]))
+   * If this is not possible (e.g., argument types not sufficiently compatible 
to generate a Calcite SEARCH expression),
+   * tries to generate an OR expression:
+   * IN [A,B,C] => OR [EQUALS [A,B], EQUALS [A,C]]
+   * If this is not possible (e.g., non-deterministic calls are found in the 
expressions), returns null
+   */
+  public static RexNode rewriteInClause(List<RexNode> childRexNodeLst, 
RexBuilder rexBuilder) {
+    if (childRexNodeLst.size() == 2) {
+      return rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, childRexNodeLst);
+    }
+
+    RexNode firstPred = childRexNodeLst.get(0);
+    List<RexNode> ranges = childRexNodeLst.subList(1, childRexNodeLst.size());
+    RexNode res = rexBuilder.makeIn(firstPred, ranges);
+    if (res.getKind() == SqlKind.SEARCH) {
+      return res;
+    }
+    // Calcite SEARCH conversion was not possible: generate our own OR 
expression
+    List<RexNode> newInputs = 
RexNodeConverter.transformInToOrOperands(childRexNodeLst, rexBuilder);
+    if (newInputs == null) {
+      return null;
+    }
+    return newInputs.size() == 1 ? newInputs.get(0) : 
rexBuilder.makeCall(SqlStdOperatorTable.OR, newInputs);

Review Comment:
   The RexBuilder#makeIn method also generates an OR when it cannot create a 
SEARCH. Is there any reason of why we opt for the Hive specific IN to OR 
transformation?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -577,6 +570,37 @@ public static List<RexNode> 
transformInToOrOperands(List<RexNode> operands, RexB
     return disjuncts;
   }
 
+  /**
+   * This method tries to rewrite IN expression arguments into an equivalent 
call.
+   * If there are only two elements, generates an EQUALS:
+   * IN [A,B] => EQUALS [A,B]
+   * Otherwise, tries to generate a SEARCH:
+   * IN [A,B,C] => SEARCH(A, SARG([B..B], [C..C]))
+   * If this is not possible (e.g., argument types not sufficiently compatible 
to generate a Calcite SEARCH expression),
+   * tries to generate an OR expression:
+   * IN [A,B,C] => OR [EQUALS [A,B], EQUALS [A,C]]
+   * If this is not possible (e.g., non-deterministic calls are found in the 
expressions), returns null
+   */
+  public static RexNode rewriteInClause(List<RexNode> childRexNodeLst, 
RexBuilder rexBuilder) {
+    if (childRexNodeLst.size() == 2) {
+      return rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, childRexNodeLst);
+    }
+
+    RexNode firstPred = childRexNodeLst.get(0);
+    List<RexNode> ranges = childRexNodeLst.subList(1, childRexNodeLst.size());
+    RexNode res = rexBuilder.makeIn(firstPred, ranges);
+    if (res.getKind() == SqlKind.SEARCH) {
+      return res;
+    }
+    // Calcite SEARCH conversion was not possible: generate our own OR 
expression
+    List<RexNode> newInputs = 
RexNodeConverter.transformInToOrOperands(childRexNodeLst, rexBuilder);
+    if (newInputs == null) {
+      return null;
+    }
+    return newInputs.size() == 1 ? newInputs.get(0) : 
rexBuilder.makeCall(SqlStdOperatorTable.OR, newInputs);
+  }
+
+  // TODO remove?
   public static List<RexNode> rewriteInClauseChildren(SqlOperator op, 
List<RexNode> childRexNodeLst,

Review Comment:
   If not use then yes lets drop it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to