Re: [PR] HIVE-27102: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 [hive]

via GitHub Thu, 17 Apr 2025 12:28:09 -0700


zabetak commented on code in PR #5196:
URL: https://github.com/apache/hive/pull/5196#discussion_r2048510211



##########
ql/src/test/queries/clientpositive/pointlookup3.q:
##########
@@ -1,4 +1,6 @@
 --! qt:dataset:src
+-- SORT_QUERY_RESULTS
+

Review Comment:
   The sorting was reverted by latest commits so marking this as resolved.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java:
##########
@@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) {
         switch (call.getKind()) {
           case IN:
             return transformIntoOrAndClause(rexBuilder, call);

Review Comment:
   It's not redundant with the latest commits since SEARCH is expanded to IN, 
BETWEEN, etc., before hitting the JDBC rules.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java:
##########
@@ -198,6 +208,12 @@ public Double visitCall(RexCall call) {
         return null;
       }
 
+      if (call.getKind() == SqlKind.SEARCH) {
+        RexCall expandedCall = (RexCall) RexUtil.expandSearch(rexBuilder, 
null, call);
+
+        return visitCall(expandedCall);
+      }
+

Review Comment:
   In the end, we adopted an expansion approach so SEARCH is expanded before 
hitting this part. The expansion still creates IN and BETWEEN so the rule 
remains mostly as it was before.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:
+      RexLiteral literal = (RexLiteral) call.operands.get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+      RexBuilder rexBuilder = childRel.getCluster().getRexBuilder();
+      int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+      // TODO: revisit this to better handle INs
+      if (sarg.isPoints()) {
+        selectivity = computeFunctionSelectivity(call);
+        if (selectivity != null) {
+          selectivity = selectivity * sarg.pointCount;
+          if (selectivity <= 0.0) {
+            selectivity = 0.10;
+          } else if (selectivity >= 1.0) {
+            selectivity = 1.0;
+          }
+        }
+        break;
+
+      } else {
+        return  visitCall((RexCall) 
+            transformOrToInAndInequalityToBetween(

Review Comment:
   With the latest changes the expansion is done before hitting the estimator 
and this class remains mostly unchanged.



##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -508,8 +485,16 @@ public void 
testComputeRangePredicateSelectivityBetweenWithNULLS() {
   public void testComputeRangePredicateSelectivityNotBetweenWithNULLS() {
     doReturn((double) 20).when(tableMock).getRowCount();
     
doReturn(Collections.singletonList(stats)).when(tableMock).getColStat(Collections.singletonList(0));
-    RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.BETWEEN, 
boolTrue, inputRef0, int1, int3);
+    RexNode filter = makeNotBetween(inputRef0, int1, int3);
     FilterSelectivityEstimator estimator = new 
FilterSelectivityEstimator(scan, mq);
     Assert.assertEquals(0.55, estimator.estimateSelectivity(filter), DELTA);
   }
+
+  private RexNode makeNotBetween(RexNode inputRef, RexNode left, RexNode 
right) {
+    RexNode notBetween = REX_BUILDER
+        .makeCall(SqlStdOperatorTable.NOT, REX_BUILDER.makeBetween(inputRef, 
left, right));
+    RexSimplify simplify = new RexSimplify(REX_BUILDER, 
RelOptPredicateList.EMPTY, RexUtil.EXECUTOR);
+
+    return simplify.simplify(notBetween);

Review Comment:
   The call to simplify is strange. What exactly do we want to test here? Can't 
we create directly the simplified expression?



##########
ql/src/test/queries/clientpositive/search_nullability.q:
##########


Review Comment:
   Removed 
https://github.com/apache/hive/pull/5196/commits/56965963bf6bee744a6d3a6ef9bf1caf60e4f411.



##########
ql/src/test/queries/clientpositive/search.q:
##########


Review Comment:
   Removed in 
https://github.com/apache/hive/pull/5196/commits/56965963bf6bee744a6d3a6ef9bf1caf60e4f411.
 If we are not adding extra coverage then we don't need these tests. We can 
always add more tests if necessary even after the PR is merged.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java:
##########
@@ -179,7 +178,7 @@ protected HivePointLookupOptimizerRule(
     this.minNumORClauses = minNumORClauses;
   }
 
-  public RexNode analyzeRexNode(RexBuilder rexBuilder, RexNode condition) {
+  public static RexNode analyzeRexNode(RexBuilder rexBuilder, RexNode 
condition, int minNumORClauses) {

Review Comment:
   All these changes to HivePointLookupOptimizerRule were reverted so marking 
this as resolved.



##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -508,8 +492,19 @@ public void 
testComputeRangePredicateSelectivityBetweenWithNULLS() {
   public void testComputeRangePredicateSelectivityNotBetweenWithNULLS() {
     doReturn((double) 20).when(tableMock).getRowCount();
     
doReturn(Collections.singletonList(stats)).when(tableMock).getColStat(Collections.singletonList(0));
-    RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.BETWEEN, 
boolTrue, inputRef0, int1, int3);
+    RexNode filter = makeNotBetween(inputRef0, int1, int3);
     FilterSelectivityEstimator estimator = new 
FilterSelectivityEstimator(scan, mq);
     Assert.assertEquals(0.55, estimator.estimateSelectivity(filter), DELTA);
   }
+
+  private RexNode makeNotBetween(RexNode inputRef, RexNode left, RexNode 
right) {
+    RexNode withOr = REX_BUILDER.makeCall(
+        SqlStdOperatorTable.OR,
+        REX_BUILDER.makeCall(SqlStdOperatorTable.LESS_THAN, inputRef, left),
+        REX_BUILDER.makeCall(SqlStdOperatorTable.GREATER_THAN, inputRef, right)
+    );
+    RexSimplify simplify = new RexSimplify(REX_BUILDER, 
RelOptPredicateList.EMPTY, RexUtil.EXECUTOR);
+
+    return simplify.simplify(withOr);

Review Comment:
   The changes were reverted by 
https://github.com/apache/hive/pull/5196/commits/1c215e52ce820ae77128a6eac45fa89f63638729.
   
   The unit tests are in place to ensure that the code that is handling the 
SqlKind.BETWEEN operator in FilterSelectivityEstimator works and has a certain 
behavior.
   From the moment that the SqlKind.BETWEEN logic is still there it's not 
appropriate to modify the tests.
   If we want to test that RexBuilder#makeBetween generates an expression with 
the same selectivity then it should be done as another test but its probably 
outside the scope of the upgrade.



##########
ql/src/test/queries/clientpositive/cardinality_preserving_join_opt2.q:
##########
@@ -2,6 +2,7 @@ set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.strict.checks.cartesian.product=false;
 set hive.stats.fetch.column.stats=true;
+set hive.cbo.rule.exclusion.regex=HivePreFilteringRule;

Review Comment:
   Problem was fixed by f24c0b9cc86248c231795db9f41c61c8aab5a139 and the 
changes to the test were reverted.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:

Review Comment:
   It's not dead code based on the latest approach where we continue to use IN 
and BETWEEN.



##########
ql/src/test/results/clientpositive/llap/explainuser_4.q.out:
##########
@@ -119,24 +119,24 @@ Stage-0
           PARTITION_ONLY_SHUFFLE [RS_11]
             Group By Operator [GBY_10] (rows=1 width=8)
               Output:["_col0"],aggregations:["count()"]
-              Merge Join Operator [MERGEJOIN_29] (rows=4626 width=8)
+              Merge Join Operator [MERGEJOIN_29] (rows=3449 width=8)
                 Conds:RS_32._col0=RS_35._col0(Inner)
               <-Map 1 [SIMPLE_EDGE] vectorized, llap
                 SHUFFLE [RS_32]
                   PartitionCols:_col0
-                  Select Operator [SEL_31] (rows=3078 width=2)
+                  Select Operator [SEL_31] (rows=2297 width=2)
                     Output:["_col0"]
-                    Filter Operator [FIL_30] (rows=3078 width=2)
-                      predicate:cint BETWEEN 1000000 AND 3000000
+                    Filter Operator [FIL_30] (rows=2297 width=2)
+                      predicate:(cint is not null and cint BETWEEN 1000000 AND 
3000000)
                       TableScan [TS_0] (rows=12288 width=2)
                         
default@alltypesorc,a,Tbl:COMPLETE,Col:COMPLETE,Output:["cint"]
               <-Map 4 [SIMPLE_EDGE] vectorized, llap
                 SHUFFLE [RS_35]
                   PartitionCols:_col0
-                  Select Operator [SEL_34] (rows=2298 width=2)
+                  Select Operator [SEL_34] (rows=1715 width=2)
                     Output:["_col0"]
-                    Filter Operator [FIL_33] (rows=2298 width=8)
-                      predicate:(cint BETWEEN 1000000 AND 3000000 and cbigint 
is not null)
+                    Filter Operator [FIL_33] (rows=1715 width=8)
+                      predicate:(cint is not null and cint BETWEEN 1000000 AND 
3000000 and cbigint is not null)

Review Comment:
   Part of the redundancy is also due to Hive so I logged 
https://issues.apache.org/jira/browse/HIVE-28910.
   
   I don't think this is blocking for the PR so resolving this conversation.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:
+      RexLiteral literal = (RexLiteral) call.operands.get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+      RexBuilder rexBuilder = childRel.getCluster().getRexBuilder();
+      int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+      // TODO: revisit this to better handle INs

Review Comment:
   Not relevant after latest changes. Resolving conversation.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -80,7 +82,12 @@ private HivePushdownSnapshotFilterRule(Config config) {
   @Override
   public void onMatch(RelOptRuleCall call) {
     HiveFilter filter = call.rel(0);
-    RexNode newCondition = filter.getCondition().accept(new 
SnapshotIdShuttle(call.builder().getRexBuilder(), call.getMetadataQuery(), 
filter));
+    RexNode expandedCondition = RexUtil.expandSearch(
+        filter.getCluster().getRexBuilder(), null, filter.getCondition()
+    );

Review Comment:
   The problem was addressed by two changes. Expanding SEARCH via another rule 
in the same phase and also avoiding   Sarg literals by changing the type check 
(https://github.com/apache/hive/pull/5196/commits/d7e97fde183ee14957665fd8d45a15941f09106e).
 



##########
ql/src/test/queries/clientpositive/pcs.q:
##########
@@ -58,7 +58,7 @@ SORT BY A.key, A.value, A.ds;
 explain extended select ds from pcs_t1 where struct(case when ds='2000-04-08' 
then 10 else 20 end) in (struct(10),struct(11));
 select ds from pcs_t1 where struct(case when ds='2000-04-08' then 10 else 20 
end) in (struct(10),struct(11));
 
-explain extended select ds from pcs_t1 where struct(ds, key, rand(100)) in 
(struct('2000-04-08',1,0.2), struct('2000-04-09',2,0.3));
+explain extended select ds from pcs_t1 where struct(ds, key) in 
(struct('2000-04-08',1), struct('2000-04-09',2));

Review Comment:
   Latest commits fixed the problem with partition pruning so we don't need a 
JIRA ticket for now.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java:
##########
@@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) {
         switch (call.getKind()) {
           case IN:
             return transformIntoOrAndClause(rexBuilder, call);
+          case SEARCH:
+            return expandSearchAndRemoveRowOperator(rexBuilder, call);
           default:
             break;
         }
       }
       return RexUtil.isFlat(node) ? node : RexUtil.flatten(rexBuilder, node);
     }
+    
+    private RexNode expandSearchAndRemoveRowOperator(RexBuilder rexBuilder, 
RexCall expression) {
+      RexLiteral literal = (RexLiteral) expression.getOperands().get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+
+      if (sarg.isPoints()) {
+        // if it is a row operator, we'll have 
+        // SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]]

Review Comment:
   I addressed the problem slightly differently to avoid changing the rule too 
much. SEARCH is expanded to IN/BETWEEN/etc., before launching the JDBC rules 
(https://github.com/apache/hive/pull/5196/commits/98f809687a3054b15f1ccf721e52e8e634d43d02).
 Some changes were necessary in RexNodeConverter to handle the STRUCT literals 
that don't explicitly contain a ROW operator. 



##########
ql/src/test/results/clientpositive/llap/cbo_aggregate_reduce_functions_rule.q.out:
##########
@@ -158,7 +158,7 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: default@test
 #### A masked pattern was here ####
 CBO PLAN:
-HiveProject(_o__c0=[COALESCE($0, 0E0:DOUBLE)], _o__c1=[COALESCE($1, 
0E0:DOUBLE)], _o__c2=[COALESCE($2, 0E0:DOUBLE)], _o__c3=[COALESCE($3, 
0E0:DOUBLE)], _o__c4=[COALESCE($4, 0E0:DOUBLE)], _o__c5=[COALESCE($5, 
0E0:DOUBLE)], _o__c6=[/($6, $7)], _o__c7=[/($1, $8)], _o__c8=[/($9, $10)], 
_o__c9=[/($3, $11)], _o__c10=[/($12, $13)], _o__c11=[/($5, $14)], 
_o__c12=[POWER(/(-($15, /(*($6, $6), $7)), $7), 0.5:DECIMAL(2, 1))], 
_o__c13=[POWER(/(-($16, /(*($1, $1), $8)), $8), 0.5:DECIMAL(2, 1))], 
_o__c14=[POWER(/(-($17, /(*($9, $9), $10)), $10), 0.5:DECIMAL(2, 1))], 
_o__c15=[POWER(/(-($18, /(*($3, $3), $11)), $11), 0.5:DECIMAL(2, 1))], 
_o__c16=[POWER(/(-($19, /(*($12, $12), $13)), $13), 0.5:DECIMAL(2, 1))], 
_o__c17=[POWER(/(-($20, /(*($5, $5), $14)), $14), 0.5:DECIMAL(2, 1))], 
_o__c18=[POWER(/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 
1))), 0.5:DECIMAL(2, 1))], _o__c19=[POWER(/(-($16, /(*($1, $1), $8)), 
CASE(=($8, 1), null:BIGINT, -($8, 1))), 0.5:DECIMAL(2, 1))], _o__c20=[POWER(/
 (-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 1))), 
0.5:DECIMAL(2, 1))], _o__c21=[POWER(/(-($18, /(*($3, $3), $11)), CASE(=($11, 
1), null:BIGINT, -($11, 1))), 0.5:DECIMAL(2, 1))], _o__c22=[POWER(/(-($19, 
/(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 1))), 0.5:DECIMAL(2, 
1))], _o__c23=[POWER(/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, 
-($14, 1))), 0.5:DECIMAL(2, 1))], _o__c24=[/(-($15, /(*($6, $6), $7)), $7)], 
_o__c25=[/(-($16, /(*($1, $1), $8)), $8)], _o__c26=[/(-($17, /(*($9, $9), 
$10)), $10)], _o__c27=[/(-($18, /(*($3, $3), $11)), $11)], _o__c28=[/(-($19, 
/(*($12, $12), $13)), $13)], _o__c29=[/(-($20, /(*($5, $5), $14)), $14)], 
_o__c30=[/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 1)))], 
_o__c31=[/(-($16, /(*($1, $1), $8)), CASE(=($8, 1), null:BIGINT, -($8, 1)))], 
_o__c32=[/(-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 
1)))], _o__c33=[/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, 
-($11, 1)))
 ], _o__c34=[/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 
1)))], _o__c35=[/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, 
-($14, 1)))], _o__c36=[$0], _o__c37=[$1], _o__c38=[$2], _o__c39=[$3], 
_o__c40=[$4], _o__c41=[$5], _o__c42=[$21], _o__c43=[$8], _o__c44=[$22], 
_o__c45=[$11], _o__c46=[$23], _o__c47=[$14])
+HiveProject(_o__c0=[CAST(COALESCE($0, 0E0:DOUBLE)):DOUBLE], 
_o__c1=[CAST(COALESCE($1, 0E0:DOUBLE)):DOUBLE], _o__c2=[CAST(COALESCE($2, 
0E0:DOUBLE)):DOUBLE], _o__c3=[CAST(COALESCE($3, 0E0:DOUBLE)):DOUBLE], 
_o__c4=[CAST(COALESCE($4, 0E0:DOUBLE)):DOUBLE], _o__c5=[CAST(COALESCE($5, 
0E0:DOUBLE)):DOUBLE], _o__c6=[/($6, $7)], _o__c7=[/($1, $8)], _o__c8=[/($9, 
$10)], _o__c9=[/($3, $11)], _o__c10=[/($12, $13)], _o__c11=[/($5, $14)], 
_o__c12=[POWER(/(-($15, /(*($6, $6), $7)), $7), 0.5:DECIMAL(2, 1))], 
_o__c13=[POWER(/(-($16, /(*($1, $1), $8)), $8), 0.5:DECIMAL(2, 1))], 
_o__c14=[POWER(/(-($17, /(*($9, $9), $10)), $10), 0.5:DECIMAL(2, 1))], 
_o__c15=[POWER(/(-($18, /(*($3, $3), $11)), $11), 0.5:DECIMAL(2, 1))], 
_o__c16=[POWER(/(-($19, /(*($12, $12), $13)), $13), 0.5:DECIMAL(2, 1))], 
_o__c17=[POWER(/(-($20, /(*($5, $5), $14)), $14), 0.5:DECIMAL(2, 1))], 
_o__c18=[POWER(/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 
1))), 0.5:DECIMAL(2, 1))], _o__c19=[POWER(/(-($16, /(*($1, $1), $8)),
  CASE(=($8, 1), null:BIGINT, -($8, 1))), 0.5:DECIMAL(2, 1))], 
_o__c20=[POWER(/(-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 
1))), 0.5:DECIMAL(2, 1))], _o__c21=[POWER(/(-($18, /(*($3, $3), $11)), 
CASE(=($11, 1), null:BIGINT, -($11, 1))), 0.5:DECIMAL(2, 1))], 
_o__c22=[POWER(/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, 
-($13, 1))), 0.5:DECIMAL(2, 1))], _o__c23=[POWER(/(-($20, /(*($5, $5), $14)), 
CASE(=($14, 1), null:BIGINT, -($14, 1))), 0.5:DECIMAL(2, 1))], 
_o__c24=[/(-($15, /(*($6, $6), $7)), $7)], _o__c25=[/(-($16, /(*($1, $1), $8)), 
$8)], _o__c26=[/(-($17, /(*($9, $9), $10)), $10)], _o__c27=[/(-($18, /(*($3, 
$3), $11)), $11)], _o__c28=[/(-($19, /(*($12, $12), $13)), $13)], 
_o__c29=[/(-($20, /(*($5, $5), $14)), $14)], _o__c30=[/(-($15, /(*($6, $6), 
$7)), CASE(=($7, 1), null:BIGINT, -($7, 1)))], _o__c31=[/(-($16, /(*($1, $1), 
$8)), CASE(=($8, 1), null:BIGINT, -($8, 1)))], _o__c32=[/(-($17, /(*($9, $9), 
$10)), CASE(=($10, 1), null:BIGINT, -($10, 1)))], 
 _o__c33=[/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, -($11, 
1)))], _o__c34=[/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, 
-($13, 1)))], _o__c35=[/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), 
null:BIGINT, -($14, 1)))], _o__c36=[$0], _o__c37=[$1], _o__c38=[$2], 
_o__c39=[$3], _o__c40=[$4], _o__c41=[$5], _o__c42=[$21], _o__c43=[$8], 
_o__c44=[$22], _o__c45=[$11], _o__c46=[$23], _o__c47=[$14])

Review Comment:
   Why do we have extra `CAST` around the `COALESCE` call? IT seems that the 
COALESCE returns a double so the CAST seems redundant.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+        // convert Sarg to IN when they are points.
+        if (sarg.isPoints()) {

Review Comment:
   There was an attempt to produce NOT IN using 
HiveReduceSearchComplexityRule.java introduced by @soumyakanti3578 but later it 
was removed by 
https://github.com/apache/hive/pull/5196/commits/af781f834feb39cd41a06c392f97f3c6969acd84
   
   The fact that NOT IN is not used is fine cause it never appeared in the 
plans (.q.out files) before the upgrade. Nevertheless, the plan before the 
upgrade was slightly more performant in terms of CPU since there were less 
comparisons and operators involved.
   
   We can possibly improve the performance by fixing 
https://issues.apache.org/jira/browse/HIVE-28911 but this can wait a bit.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1067,9 +1079,17 @@ public ASTNode visitCall(RexCall call) {
         return SqlFunctionConverter.buildAST(SqlStdOperatorTable.NOT,
           
Collections.singletonList(SqlFunctionConverter.buildAST(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM,
 astNodeLst, call.getType())), call.getType());
       case CAST:
-        assert(call.getOperands().size() == 1);
+        if (call.getOperands().size() != 1) {
+          throw new IllegalArgumentException("CASTs should have only 1 
operand");
+        }
+        RexNode castOperand = call.getOperands().get(0);
+        
+        // Extract RexNode out of CAST when it's not a literal and the types 
are equal
+        if (!(castOperand instanceof RexLiteral) && 
call.getType().equals(castOperand.getType())) {
+          return castOperand.accept(this);
+        }

Review Comment:
   This was indeed a bug/regression in Calcite. It was fixed by 
https://issues.apache.org/jira/browse/CALCITE-6954 and we patched Hive using 
https://github.com/apache/hive/pull/5196/commits/d8d7ebda6cdc3b2bf456c99fabe16bee6e1032b1



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java:
##########
@@ -220,6 +234,37 @@ public Double visitCall(RexCall call) {
 
       return cost + functionCost(call);
     }
+    
+    private Double getSearchCost(RexCall call) {
+      HiveCalciteUtil.SearchTransformer<Double> transformer = new 
HiveCalciteUtil.SearchTransformer<>();
+      transformer.transform(rexBuilder, call,
+          new RexFunctionCost(this.rexBuilder) {
+            @Override
+            public Double visitLiteral(RexLiteral literal) {
+              return HiveRelMdSize.INSTANCE.typeValueSize(literal.getType(), 
literal.getValueAs(Comparable.class));
+            }
+
+            @Override
+            public Double visitInputRef(RexInputRef inputRef) {
+              return 
HiveRelMdSize.INSTANCE.averageTypeValueSize(inputRef.getType());
+            }
+          });
+      
+      List<Double> operandCosts = new ArrayList<>();
+      
+      if (!transformer.inNodes.isEmpty()) {
+        Double inOperandCosts = 
transformer.inNodes.stream().filter(Objects::nonNull).reduce(1D, Double::sum);
+        Double inFunctionCost = 2d * (transformer.inNodes.size() - 1);
+        operandCosts.add(inOperandCosts + inFunctionCost);
+      }
+      operandCosts.addAll(transformer.nodes);
+      
+      Double searchOperandCost = 
operandCosts.stream().filter(Objects::nonNull).reduce(1D, Double::sum);
+      // functionCost is essentially the cost of conjunctions or disjunctions
+      Double functionCost = operandCosts.size() == 1 ? 0D : 1d * 
operandCosts.size();
+      
+      return searchOperandCost + functionCost;
+    }

Review Comment:
   Instead of this transformation logic it may be OK to extend the 
`org.apache.calcite.rel.metadata.RelMdSize#typeValueSize` method to be able to 
handle literals of SARG type which could be useful in other places where we 
might end up calling the same logic. This could be also something to contribute 
to the CALCITE repo.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);

Review Comment:
   All changes to ASTConverter were reverted so this discussion is obsolete. 
Still there is a different behavior cause when we expand SEARCH we will always 
generate IN and never OR irrespective of thresholds but I would argue that this 
is what should happen.



##########
iceberg/iceberg-handler/src/test/results/positive/llap/llap_iceberg_read_orc.q.out:
##########
@@ -359,11 +359,11 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: o
-                  filterExpr: ((quantity NOT BETWEEN 39 AND 0 or quantity NOT 
BETWEEN 69 AND 39 or (quantity > 70)) and (((quantity > 0) and (quantity < 39)) 
or ((quantity > 39) and (quantity < 69)) or (quantity > 70)) and itemid is not 
null) (type: boolean)
+                  filterExpr: ((((quantity > 0) and (quantity < 39)) or 
((quantity > 39) and (quantity < 69)) or (quantity > 70)) and quantity is not 
null and itemid is not null) (type: boolean)
                   probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_29_container, 
bigKeyColName:itemid, smallTablePos:1, keyRatio:0.9523809523809523

Review Comment:
   Copying below the message from 
https://github.com/apache/hive/pull/5196/commits/af781f834feb39cd41a06c392f97f3c6969acd84
   
   A. NOT BETWEEN disappears completely and is transformed to disjunction with 
range predicates.
   ```
   not hr BETWEEN 12 AND 14 => (hr < 12) or (hr > 14)
   ```
   This is a net positive change. The new version is equally efficient and 
easier to handle since it relies on primitive operators that are handled better 
in Calcite and Hive. Since NOT BETWEEN disappears more code could be 
potentially dropped but it is outside the scope of this commit and upgrade.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java:
##########
@@ -92,6 +91,15 @@ public HiveJdbcImplementor(SqlDialect dialect, 
JavaTypeFactory typeFactory) {
             sqlCondition);
     return result(join, leftResult, rightResult);
   }
+  
+  @Override
+  public Result visit(TableScan scan) {
+    if (!(scan instanceof JdbcTableScan)) {
+      return super.visit(scan);
+    }
+    
+    return result(((JdbcTableScan) scan).jdbcTable.tableName(), 
ImmutableList.of(Clause.FROM), scan, null);
+  }

Review Comment:
   I still don't see why we need to override the default behavior. What errors 
are we getting if we don't add this logic?



##########
iceberg/iceberg-handler/src/test/results/positive/query_iceberg_metadata_of_unpartitioned_table.q.out:
##########


Review Comment:
   Hopefully it will be fixed by 
https://github.com/apache/hive/pull/5196/commits/54adb497ed7fe89e0e7d64a16cec1fc513dcc174



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java:
##########
@@ -292,7 +293,11 @@ protected RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
       basePlan = executeProgram(basePlan, program.build(), mdProvider, 
executorProvider, singletonList(materialization));
 
       program = new HepProgramBuilder();
-      generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST, 
HivePushdownSnapshotFilterRule.INSTANCE);
+      generatePartialProgram(program,
+          false,
+          HepMatchOrder.DEPTH_FIRST,
+          HiveSearchRules.FILTER_SEARCH_EXPAND,

Review Comment:
   We do the expansion on the filter only to allow the 
`HivePushdownSnapshotFilterRule` to trigger and work as before. The pushdown 
rule itself may re-introduce a SEARCH so at this point we don't strive to 
eliminate it from the plan.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java:
##########
@@ -72,6 +78,22 @@ public HiveFilter(RelOptCluster cluster, RelTraitSet traits, 
RelNode child, RexN
     super(cluster, TraitsUtil.getDefaultTraitSet(cluster), child, condition);
     this.correlationInfos = new CorrelationInfoSupplier(getCondition());
   }
+  
+  @Override 
+  public RelWriter explainTerms(RelWriter pw) {
+    // Remove this method after upgrading Calcite to 1.35+

Review Comment:
   Changes were reverted in latest commits so this is not relevant anymore. 
Resolving the conversation.



##########
iceberg/iceberg-handler/src/test/results/positive/delete_iceberg_mixed.q.out:
##########
@@ -84,17 +84,17 @@ Stage-4
               <-Reducer 2 [CONTAINS]
                 File Output Operator [FS_46]
                   table:{"name:":"default.ice01"}
-                  Select Operator [SEL_44] (rows=3 width=295)
+                  Select Operator [SEL_44] (rows=3 width=266)
                     Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
-                    Merge Join Operator [MERGEJOIN_43] (rows=3 width=295)
+                    Merge Join Operator [MERGEJOIN_43] (rows=3 width=266)
                       Conds:RS_57._col4=RS_63._col0(Left 
Semi),Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
                     <-Map 1 [SIMPLE_EDGE] vectorized
                       SHUFFLE [RS_57]
                         PartitionCols:_col4
-                        Select Operator [SEL_55] (rows=5 width=295)
+                        Select Operator [SEL_55] (rows=6 width=280)

Review Comment:
   Not relevant anymore since these stat changes are not present in latest 
patch. Resolving the conversation.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java:
##########
@@ -656,7 +658,9 @@ public static ImmutableList<RexNode> 
getPredsNotPushedAlready(Set<String> predic
     }
     // Build map to not convert multiple times, further remove already 
included predicates
     Map<String,RexNode> stringToRexNode = Maps.newLinkedHashMap();
+    RexSimplify simplify = new RexSimplify(inp.getCluster().getRexBuilder(), 
RelOptPredicateList.EMPTY, RexUtil.EXECUTOR);
     for (RexNode r : predsToPushDown) {
+      r = simplify.simplify(r);

Review Comment:
   The StackOverflow is addressed by 
https://github.com/apache/hive/pull/5196/commits/f24c0b9cc86248c231795db9f41c61c8aab5a139
 making the HivePreFilteringRule perform reduction 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -268,6 +270,7 @@ private RexNode convert(ExprNodeGenericFuncDesc func) 
throws SemanticException {
           // except complex types
           // Rewrite to OR is done only if number of operands are less than
           // the threshold configured
+          // TODO Is this rewrite unconditional?

Review Comment:
   All TODOs need to be removed before merging. We should create JIRAs if there 
are remaining tasks to follow-up. 
(https://github.com/apache/hive/pull/5196/commits/612caceddf24f149e30d8d118dcd55f920720eeb).



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java:
##########
@@ -22,8 +22,6 @@
 import org.apache.calcite.rel.RelNode;
 import org.apache.calcite.rel.RelShuttle;
 import org.apache.calcite.rel.core.Filter;
-import org.apache.calcite.rel.core.Join;
-import org.apache.calcite.rel.core.TableScan;

Review Comment:
   Restored.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java:
##########
@@ -138,6 +143,18 @@ public boolean isSynthetic() {
 
   @Override
   public RelWriter explainTerms(RelWriter pw) {
+    // Remove this if block after upgrading Calcite to 1.35+
+    if (CALCITE_VERSION < 35 
+        && pw.getDetailLevel() == SqlExplainLevel.ALL_ATTRIBUTES && 
getDigest().contains("SEARCH")) {
+      return ((HiveProject) this.accept(
+          RexUtil.searchShuttle(
+              new RexBuilder(new JavaTypeFactoryImpl(new 
HiveTypeSystemImpl())),
+              null,
+              -1
+          )
+      )).explainTerms(pw);
+    }

Review Comment:
   Changes were reverted in latest commits so marking this as resolved.



##########
packaging/pom.xml:
##########
@@ -173,6 +173,10 @@
                   \Qhttps://www.eclipse.org/org/documents/epl-v10.php\E
                   \Qhttp://www.eclipse.org/org/documents/epl-v10.php\E
                 </EPL-1.0>
+                <EPL-2.0>
+                  
\Qhttps://raw.githubusercontent.com/locationtech/jts/master/LICENSE_EPLv2.txt\E
+                  \Qhttps://www.eclipse.org/org/documents/epl-2.0/EPL-2.0.txt\E
+                </EPL-2.0>

Review Comment:
   As part of the upgrade we need to add an explicit dependency to jts-core. 
The respective jar will be packaged in the binary distribution of Hive so we 
need to have its license packaged as well. The changes here are necessary and 
useful to avoid duplicate licenses. 



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########


Review Comment:
   All TODOs need to be removed before merging. We should create JIRAs if there 
are remaining tasks to follow-up. 
(https://github.com/apache/hive/pull/5196/commits/612caceddf24f149e30d8d118dcd55f920720eeb)
 



##########
ql/src/test/results/clientpositive/llap/excluded_rule_explain.q.out:
##########


Review Comment:
   https://issues.apache.org/jira/browse/CALCITE-6832 As said before not 
urgent. We can fix it after merging the PR.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+        // convert Sarg to IN when they are points.
+        if (sarg.isPoints()) {
+          // just expand SEARCH to ORs when point count is less than 
HIVE_POINT_LOOKUP_OPTIMIZER_MIN
+          if (sarg.pointCount < minOrClauses) {
+            return visitCall((RexCall) 
call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)));
+          }
+
+          // else convert to IN
+          for (Range<?> range : sarg.rangeSet.asRanges()) {
+            astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral(
+                    range.lowerEndpoint(), literal.getType(), true, true)));
+          }
+          
+          return SqlFunctionConverter.buildAST(HiveIn.INSTANCE, astNodeLst, 
call.getType());
+          // Expand SEARCH operator
+        } else {
+          return visitCall((RexCall) transformOrToInAndInequalityToBetween(
+                  rexBuilder,
+                  call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)),
+                  minOrClauses
+              )

Review Comment:
   Now the transformation is done before hitting the AST converter and the 
logic is encapsulated in SearchTransformer. We can successfully convert to IN 
and BETWEEN but not on their negated counterparts. As discussed in other places 
it seems that we don't need to handle the NOT IN and NOT  BETWEEN cases; at 
least not immediately.



##########
ql/src/test/results/clientpositive/llap/cbo_join_transitive_pred_loop_1.q.out:
##########
@@ -67,6 +67,6 @@ HiveProject(month=[$0], con_usd=[$2])
           HiveFilter(condition=[=($0, 202110)])
             HiveTableScan(table=[[default, test2]], table:alias=[test2])
     HiveProject(mth=[$0], con_usd=[$1])
-      HiveFilter(condition=[IS NOT NULL($0)])
+      HiveFilter(condition=[AND(IS NOT NULL($0), IN($0, 202110, 202503))])
         HiveTableScan(table=[[default, test3]], table:alias=[d])

Review Comment:
   It seems correct to me. Isn't it the constant folding of the following 
expression?
   ```sql
   cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 
'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int)
   ```
   Notice that there is `add_months` with a `-1` argument. So if we take the 
current date and we subtract one month we go to March (03).



##########
ql/src/test/results/clientpositive/llap/bucketpruning1.q.out:
##########
@@ -1656,7 +1656,7 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: default@srcbucket_pruned
 #### A masked pattern was here ####
 OPTIMIZED SQL: SELECT *
-FROM (VALUES  (NULL, NULL, NULL)) AS `t` (`key`, `value`, `ds`)
+FROM (SELECT NULL AS `key`, NULL AS `value`, NULL AS `ds`) AS `t`

Review Comment:
   The before and after are equivalent and rather trivial so I don't think it 
matters. I guess we can always do the transformation when we have a single 
entry in values not matter if it is NULL or not.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlAggFunction.java:
##########
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.functions;
+
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlIdentifier;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlOperandTypeChecker;
+import org.apache.calcite.sql.type.SqlOperandTypeInference;
+import org.apache.calcite.sql.type.SqlReturnTypeInference;
+import org.apache.calcite.util.Optionality;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelBuilder;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+public class HiveSqlAggFunction extends SqlAggFunction {

Review Comment:
   Instead of using HiveSqlAggFunction and passing from the RelBuilder an 
alternative approach has been committed in 
https://github.com/apache/hive/pull/5196/commits/c0469f75a50b2249ce8c8506e80c85a931394d63
 to overcome the breaking changes in CALCITE-4342.
   
   CALCITE-4342 made rollup strategy a first class citizen of the 
SqlAggFunction interface so each function can return its own rollup if 
available. It's no longer necessary to pass from the RelBuilder and basically 
whenever we need to obtain a rollup we have to ask directly the SqlAggFunction.



##########
ql/src/test/queries/clientpositive/sharedwork.q:
##########
@@ -20,6 +20,24 @@ create table MY_TABLE_0001_01 (
   col_1 string,
   col_100 string);
 
+explain cbo SELECT
+  Table__323.col_7,
+  CAST(Table__323.col_3 AS DATE) col_3,
+  Table__323.col_20,  
+  Table__1232.col_21 col_21_1232,
+  Table__323.col_1,
+  Table__133.col_22,
+  Table__879.col_21 col_21_879
+  ,Table__133.col_23
+FROM MY_TABLE_0001 Table__323
+LEFT OUTER JOIN MY_TABLE_0003 Table__1232 ON 
(Table__323.col_20=Table__1232.col_24)
+LEFT OUTER JOIN MY_TABLE_0001_00 Table__133 ON 
(Table__323.col_1=Table__133.col_1)
+LEFT OUTER JOIN MY_TABLE_0003 Table__879 ON 
(Table__133.col_22=Table__879.col_24)
+LEFT OUTER JOIN MY_TABLE_0001_01 Table__1215 ON 
(Table__323.col_1=Table__1215.col_1 and Table__1215.col_100 = 210)
+WHERE 1=1
+AND  (cast(Table__323.col_7 AS DOUBLE) IS NOT NULL OR Table__323.col_7 IS NULL)
+AND CAST(Table__323.col_3 AS DATE)  BETWEEN  '2018-07-01'  AND  '2019-01-23'
+AND Table__323.col_20  IN  ('part1','part2','part3');

Review Comment:
   Changes were reverted so marking this as resolved.



##########
ql/src/test/results/clientpositive/llap/bucketsortoptimize_insert_7.q.out:
##########
@@ -95,10 +95,10 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: b
-                  filterExpr: (key) IN (0, 5) (type: boolean)
+                  filterExpr: ((key) IN (0, 5) and key is not null) (type: 
boolean)
                   Statistics: Num rows: 84 Data size: 7896 Basic stats: 
COMPLETE Column stats: COMPLETE

Review Comment:
   I think we can leave with some redundant NOT NULL predicates for a while. 
The performance impact should be minimal and probably not noticeable. However, 
it would be nice to address few of those so I logged 
https://issues.apache.org/jira/browse/HIVE-28910. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RangeConverter.java:
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.util.RangeSets;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween;
+
+import java.util.ArrayList;
+import java.util.List;
+
+class RangeConverter<C extends Comparable<C>> implements RangeSets.Consumer<C> 
{
+
+  protected final RexBuilder rexBuilder;
+  protected final RelDataType type;
+  protected final RexNode ref;
+  public final List<RexNode> inNodes;
+  public final List<RexNode> nodes;

Review Comment:
   Since tests are now green, I prefer to avoid touching the code for minor 
stylistic issues. If more changes are needed I will fix this otherwise we can 
do it in a follow-up.



##########
iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out:
##########
@@ -150,27 +150,27 @@ Stage-0
       File Output Operator [FS_61]
         Limit [LIM_60] (rows=20 width=447)
           Number of rows:20
-          Select Operator [SEL_59] (rows=791 width=447)
+          Select Operator [SEL_59] (rows=473 width=447)
             Output:["_col0","_col1","_col2","_col3","_col4"]
           <-Map 1 [SIMPLE_EDGE] vectorized, llap
             SHUFFLE [RS_58]
-              Top N Key Operator [TNK_57] (rows=791 width=447)
+              Top N Key Operator [TNK_57] (rows=473 width=447)
                 keys:_col0,top n:20
-                Map Join Operator [MAPJOIN_56] (rows=791 width=447)
+                Map Join Operator [MAPJOIN_56] (rows=473 width=447)
                   BucketMapJoin:true,Conds:SEL_55._col0, _col1=RS_53._col0, 
_col1(Inner),Output:["_col0","_col1","_col2","_col3","_col4"]
                 <-Map 3 [CUSTOM_EDGE] vectorized, llap
                   MULTICAST [RS_53]
                     PartitionCols:_col0, _col1
-                    Select Operator [SEL_52] (rows=500 width=178)
+                    Select Operator [SEL_52] (rows=387 width=178)
                       Output:["_col0","_col1"]
-                      Filter Operator [FIL_51] (rows=500 width=178)
-                        predicate:((key <> '0') and (key <> '100') and value 
is not null)
+                      Filter Operator [FIL_51] (rows=387 width=178)
+                        predicate:(((key < '0') or ((key > '0') and (key < 
'100')) or (key > '100')) and key is not null and value is not null)
                         TableScan [TS_3] (rows=500 width=178)
                           
default@src,b,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]
-                <-Select Operator [SEL_55] (rows=500 width=269)
+                <-Select Operator [SEL_55] (rows=387 width=269)
                     Output:["_col0","_col1","_col2"]
-                    Filter Operator [FIL_54] (rows=500 width=269)
-                      predicate:((key1 <> '0') and (key1 <> '100') and key2 is 
not null)
+                    Filter Operator [FIL_54] (rows=387 width=269)
+                      predicate:(((key1 < '0') or ((key1 > '0') and (key1 < 
'100')) or (key1 > '100')) and key1 is not null and key2 is not null)
                       TableScan [TS_0] (rows=500 width=269)

Review Comment:
   Yes, it is discussed here: 
https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.nfxo4n2hxorj
   
   I also logged https://issues.apache.org/jira/browse/HIVE-28911 but I think 
its not blocking for merging the PR.



##########
ql/src/test/results/clientpositive/llap/concat_op.q.out:
##########


Review Comment:
   I guess we can leave with this problem for a little while. We don't have 
many consumers of the JSON format so it is unlikely to have much impact on 
end-users. The problem is already fixed in 
https://issues.apache.org/jira/browse/CALCITE-6832 so we can patch Hive shortly 
after this PR is merged.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1177,4 +1185,633 @@ public static List<Integer> getNewRelDistributionKeys(
     }
     return distributionKeys;
   }
+
+  /**
+   * This class just wraps around a RexNode enables equals/hashCode based on 
toString.
+   * After CALCITE-2632 this might not be needed anymore */
+  public static class RexNodeRef {
+
+    public static Comparator<RexNodeRef> COMPARATOR =
+        Comparator.comparing((RexNodeRef o) -> o.node.toString());
+    private final RexNode node;
+
+    public RexNodeRef(RexNode node) {
+      this.node = node;
+    }
+
+    public RexNode getRexNode() {
+      return node;
+    }
+
+    @Override
+    public int hashCode() {
+      return node.toString().hashCode();
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (o instanceof RexNodeRef) {
+        RexNodeRef otherRef = (RexNodeRef) o;
+        return node.toString().equals(otherRef.node.toString());
+      }
+      return false;
+    }
+
+    @Override
+    public String toString() {
+      return "ref for:" + node.toString();
+    }
+  }
+
+  /**
+   * Represents a constraint.
+   *
+   * Example: a=1
+   * substr(a,1,2) = concat('asd','xxx')
+   */
+  public static class Constraint {
+
+    private final RexNode exprNode;
+    private final RexNode constNode;
+
+    public Constraint(RexNode exprNode, RexNode constNode) {
+      this.exprNode = exprNode;
+      this.constNode = constNode;
+    }
+
+    /**
+     * Interprets argument as a constraint; if not possible returns null.
+     */
+    public static Constraint of(RexNode n) {
+      if (!(n instanceof RexCall)) {
+        return null;
+      }
+      RexCall call = (RexCall) n;
+      if (call.getOperator().getKind() != SqlKind.EQUALS) {
+        return null;
+      }
+      RexNode opA = call.operands.get(0);
+      RexNode opB = call.operands.get(1);
+      if (RexUtil.isNull(opA) || RexUtil.isNull(opB)) {
+        // dont try to compare nulls
+        return null;
+      }
+      if (isConstExpr(opA) && isColumnExpr(opB)) {
+        return new Constraint(opB, opA);
+      }
+      if (isColumnExpr(opA) && isConstExpr(opB)) {
+        return new Constraint(opA, opB);
+      }
+      return null;
+    }
+
+    private static boolean isColumnExpr(RexNode node) {
+      return !node.getType().isStruct() && 
!HiveCalciteUtil.getInputRefs(node).isEmpty()
+          && HiveCalciteUtil.isDeterministic(node);
+    }
+
+    private static boolean isConstExpr(RexNode node) {
+      return !node.getType().isStruct() && 
HiveCalciteUtil.getInputRefs(node).isEmpty()
+          && HiveCalciteUtil.isDeterministic(node);
+    }
+
+    public RexNodeRef getKey() {
+      return new RexNodeRef(exprNode);
+    }
+
+    public RexNode getValue() {
+      return constNode;
+    }
+
+  }
+
+  /**
+   * Merge IN clauses, when possible.
+   */
+  public static class RexMergeInClause extends RexShuttle {
+    private final RexBuilder rexBuilder;
+
+    public RexMergeInClause(RexBuilder rexBuilder) {
+      this.rexBuilder = rexBuilder;
+    }
+
+    @Override public RexNode visitCall(RexCall call) {
+      switch (call.getKind()) {
+        case AND:
+          return handleAND(rexBuilder, call);
+        case OR:
+          return handleOR(rexBuilder, call);
+        default:
+          return super.visitCall(call);
+      }
+    }
+
+    private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+      // Visited nodes
+      final Set<RexNode> visitedRefs = new LinkedHashSet<>();
+      // IN clauses need to be combined by keeping only common elements
+      final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+      // We will use this set to keep those expressions that may evaluate
+      // into a null value.
+      final Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+      final List<RexNode> operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+
+      for (int i = 0; i < operands.size(); i++) {
+        RexNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.IN) {
+          RexCall inCall = (RexCall) operand;
+          if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+            continue;
+          }
+          RexNode ref = inCall.getOperands().get(0);
+          visitedRefs.add(ref);
+          if (ref.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(ref, ref);
+          }
+          if (inLHSExprToRHSExprs.containsKey(ref)) {
+            Set<SimilarRexNodeElement> expressions = Sets.newHashSet();
+            for (int j = 1; j < inCall.getOperands().size(); j++) {
+              RexNode constNode = inCall.getOperands().get(j);
+              expressions.add(new SimilarRexNodeElement(constNode));
+              if (constNode.getType().isNullable()) {
+                inLHSExprToRHSNullableExprs.put(ref, constNode);
+              }
+            }
+            Collection<SimilarRexNodeElement> knownConstants = 
inLHSExprToRHSExprs.get(ref);
+            if (!shareSameType(knownConstants, expressions)) {
+              return call;
+            }
+            knownConstants.retainAll(expressions);
+          } else {
+            for (int j = 1; j < inCall.getOperands().size(); j++) {
+              RexNode constNode = inCall.getOperands().get(j);
+              inLHSExprToRHSExprs.put(ref, new 
SimilarRexNodeElement(constNode));
+              if (constNode.getType().isNullable()) {
+                inLHSExprToRHSNullableExprs.put(ref, constNode);
+              }
+            }
+          }
+          operands.remove(i);
+          --i;
+        } else if (operand.getKind() == SqlKind.EQUALS) {
+          Constraint c = Constraint.of(operand);
+          if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
+            continue;
+          }
+          visitedRefs.add(c.exprNode);
+          if (c.exprNode.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(c.exprNode, c.exprNode);
+          }
+          if (c.constNode.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(c.exprNode, c.constNode);
+          }
+          if (inLHSExprToRHSExprs.containsKey(c.exprNode)) {
+            Collection<SimilarRexNodeElement> knownConstants = 
inLHSExprToRHSExprs.get(c.exprNode);
+            Collection<SimilarRexNodeElement> nextConstant = 
Collections.singleton(new SimilarRexNodeElement(c.constNode));
+            if (!shareSameType(knownConstants, nextConstant)) {
+              return call;
+            }
+            knownConstants.retainAll(nextConstant);
+          } else {
+            inLHSExprToRHSExprs.put(c.exprNode, new 
SimilarRexNodeElement(c.constNode));
+          }
+          operands.remove(i);
+          --i;
+        }
+      }
+      // Create IN clauses
+      final List<RexNode> newOperands = createInClauses(rexBuilder,
+          visitedRefs, inLHSExprToRHSExprs, inLHSExprToRHSNullableExprs);
+      newOperands.addAll(operands);
+      // Return node
+      return RexUtil.composeConjunction(rexBuilder, newOperands, false);
+    }
+
+    protected static class SimilarRexNodeElement {
+      private final RexNode rexNode;
+
+      protected SimilarRexNodeElement(RexNode rexNode) {
+        this.rexNode = rexNode;
+      }
+
+      public RexNode getRexNode() {
+        return rexNode;
+      }
+
+      @Override
+      public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+        SimilarRexNodeElement that = (SimilarRexNodeElement) o;
+        return equalsWithSimilarType(rexNode, that.rexNode);
+      }
+
+      private static boolean equalsWithSimilarType(RexNode rexNode1, RexNode 
rexNode2) {
+        if (!(rexNode1 instanceof RexLiteral) || !(rexNode2 instanceof 
RexLiteral)) {
+          return rexNode1.equals(rexNode2);
+        }
+
+        RexLiteral rexLiteral1 = (RexLiteral) rexNode1;
+        RexLiteral rexLiteral2 = (RexLiteral) rexNode2;
+
+        if (rexLiteral1.getValue() == null && rexLiteral2.getValue() == null) {
+          return true;
+        }
+
+        return rexLiteral1.getValue() != null && 
rexLiteral1.getValue().compareTo(rexLiteral2.getValue()) == 0 &&
+            
rexLiteral1.getType().getSqlTypeName().equals(rexLiteral2.getType().getSqlTypeName());
+      }
+
+      @Override
+      public int hashCode() {
+        if (rexNode instanceof RexLiteral) {
+          RexLiteral rexLiteral = (RexLiteral) rexNode;
+          return Objects.hash(rexLiteral.getValue(), 
rexLiteral.getType().getSqlTypeName());
+        }
+        return Objects.hash(rexNode);
+      }
+    }
+
+    /**
+     * Check if the type of nodes in the two collections is homogeneous within 
the collections
+     * and identical between them.
+     * @param nodes1 the first collection of nodes
+     * @param nodes2 the second collection of nodes
+     * @return true if nodes in both collections is unique and identical, 
false otherwise
+     */
+    private static boolean shareSameType(
+        Collection<SimilarRexNodeElement> nodes1, 
Collection<SimilarRexNodeElement> nodes2) {
+      return Stream.of(nodes1, nodes2).flatMap(Collection::stream)
+          .map(n -> n.getRexNode().getType().getSqlTypeName())
+          .distinct()
+          .count() == 1;
+    }
+
+    private static RexNode handleOR(RexBuilder rexBuilder, RexCall call) {
+      // IN clauses need to be combined by keeping all elements
+      final List<RexNode> operands = new 
ArrayList<>(RexUtil.flattenOr(call.getOperands()));
+      final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+      for (int i = 0; i < operands.size(); i++) {
+        RexNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.IN) {
+          RexCall inCall = (RexCall) operand;
+          if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+            continue;
+          }
+          RexNode ref = inCall.getOperands().get(0);
+          for (int j = 1; j < inCall.getOperands().size(); j++) {
+            inLHSExprToRHSExprs.put(ref, new 
SimilarRexNodeElement(inCall.getOperands().get(j)));
+          }
+          operands.remove(i);
+          --i;
+        }
+      }
+      // Create IN clauses (fourth parameter is not needed since no 
expressions were removed)
+      final List<RexNode> newOperands = createInClauses(rexBuilder,
+          inLHSExprToRHSExprs.keySet(), inLHSExprToRHSExprs, null);
+      newOperands.addAll(operands);
+      // Return node
+      RexNode result = RexUtil.composeDisjunction(rexBuilder, newOperands, 
false);
+      if (!result.getType().equals(call.getType())) {
+        return rexBuilder.makeCast(call.getType(), result, true);
+      }
+      return result;
+    }
+
+    private static RexNode createResultFromEmptySet(RexBuilder rexBuilder,
+                                                    RexNode ref, 
Multimap<RexNode, RexNode> inLHSExprToRHSNullableExprs) {
+      if (inLHSExprToRHSNullableExprs.containsKey(ref)) {
+        // We handle possible null values in the expressions.
+        List<RexNode> nullableExprs =
+            inLHSExprToRHSNullableExprs.get(ref)
+                .stream()
+                .map(n -> rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, 
ImmutableList.of(n)))
+                .collect(Collectors.toList());
+        return RexUtil.composeConjunction(rexBuilder,
+            ImmutableList.of(
+                RexUtil.composeDisjunction(rexBuilder, nullableExprs, false),
+                
rexBuilder.makeNullLiteral(rexBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN))),
+            false);
+      }
+      return rexBuilder.makeLiteral(false);
+    }
+
+    private static List<RexNode> createInClauses(RexBuilder rexBuilder, 
Set<RexNode> visitedRefs,
+                                                 Multimap<RexNode, 
SimilarRexNodeElement> inLHSExprToRHSExprs, Multimap<RexNode,RexNode> 
inLHSExprToRHSNullableExprs) {
+      final List<RexNode> newExpressions = new ArrayList<>();
+      for (RexNode ref : visitedRefs) {
+        Collection<SimilarRexNodeElement> exprs = inLHSExprToRHSExprs.get(ref);
+        if (exprs.isEmpty()) {
+          // Note that Multimap does not keep a key if all its values are 
removed.
+          newExpressions.add(createResultFromEmptySet(rexBuilder, ref, 
inLHSExprToRHSNullableExprs));
+        } else if (exprs.size() == 1) {
+          List<RexNode> newOperands = new ArrayList<>(2);
+          newOperands.add(ref);
+          newOperands.add(exprs.iterator().next().getRexNode());
+          newExpressions.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, 
newOperands));
+        } else {
+          List<RexNode> newOperands = new ArrayList<>(exprs.size() + 1);
+          newOperands.add(ref);
+          
newOperands.addAll(exprs.stream().map(SimilarRexNodeElement::getRexNode).collect(Collectors.toList()));
+          newExpressions.add(rexBuilder.makeIn(newOperands.get(0), 
newOperands.subList(1, newOperands.size())));
+        }
+      }
+      return newExpressions;
+    }
+
+  }
+
+  /**
+   * Transforms inequality candidates into [NOT] BETWEEN calls.
+   *
+   */
+  public static class RexTransformIntoBetween extends RexShuttle {
+    private final RexBuilder rexBuilder;
+
+    public RexTransformIntoBetween(RexBuilder rexBuilder) {
+      this.rexBuilder = rexBuilder;
+    }
+
+    @Override
+    public RexNode visitCall(RexCall inputCall) {
+      RexNode node = super.visitCall(inputCall);
+      if (node instanceof RexCall) {
+        RexCall call = (RexCall) node;
+        switch (call.getKind()) {
+          case AND:
+            return processComparisons(call, SqlKind.LESS_THAN_OR_EQUAL, false);
+          case OR:
+            return processComparisons(call, SqlKind.GREATER_THAN, true);
+          default:
+            break;
+        }
+      }
+      return node;
+    }
+
+    /**
+     * Represents a replacement candidate.
+     */
+    static class BetweenCandidate {
+
+      private final RexNode newNode;
+      private final RexNode[] oldNodes;
+      // keeps track if this candidate was already used during replacement
+      private boolean used;
+
+      public BetweenCandidate(RexNode newNode, RexNode... oldNodes) {
+        this.newNode = newNode;
+        this.oldNodes = oldNodes;
+      }
+    }
+
+    private RexNode processComparisons(RexCall call, SqlKind forwardEdge, 
boolean invert) {
+      DiGraph<RexNodeRef, RexCall> g =
+          buildComparisonGraph(call.getOperands(), forwardEdge);
+      Map<RexNode, BetweenCandidate> replacedNodes = new IdentityHashMap<>();
+      for (RexNodeRef n : g.nodes()) {
+        Set<RexNodeRef> pred = g.predecessors(n);
+        Set<RexNodeRef> succ = g.successors(n);
+        if (!pred.isEmpty() && !succ.isEmpty()) {
+          RexNodeRef p = pred.iterator().next();
+          RexNodeRef s = succ.iterator().next();
+
+          RexNode between = rexBuilder.makeCall(HiveBetween.INSTANCE,
+              rexBuilder.makeLiteral(invert), n.getRexNode(), p.getRexNode(), 
s.getRexNode());
+          BetweenCandidate bc = new BetweenCandidate(
+              between,
+              g.removeEdge(p, n),
+              g.removeEdge(n, s));
+
+          for (RexNode node : bc.oldNodes) {
+            replacedNodes.put(node, bc);
+          }
+        }
+      }
+      if (replacedNodes.isEmpty()) {
+        // no effect
+        return call;
+      }
+      List<RexNode> newOperands = new ArrayList<>();
+      for (RexNode o : call.getOperands()) {
+        BetweenCandidate candidate = replacedNodes.get(o);
+        if (candidate == null) {
+          newOperands.add(o);
+        } else {
+          if (!candidate.used) {
+            newOperands.add(candidate.newNode);
+            candidate.used = true;
+          }
+        }
+      }
+
+      if (newOperands.size() == 1) {
+        return newOperands.get(0);
+      } else {
+        return rexBuilder.makeCall(call.getOperator(), newOperands);
+      }
+    }
+
+    /**
+     * Builds a graph of the given comparison type.
+     * The graph edges are annotated with the RexNodes representing the 
comparison.
+     */
+    private DiGraph<RexNodeRef, RexCall> buildComparisonGraph(List<RexNode> 
operands, SqlKind cmpForward) {
+      DiGraph<RexNodeRef, RexCall> g = new DiGraph<>();
+      for (RexNode node : operands) {
+        if(!(node instanceof RexCall) ) {
+          continue;
+        }
+        RexCall rexCall = (RexCall) node;
+        SqlKind kind = rexCall.getKind();
+        if (kind == cmpForward) {
+          RexNode opA = rexCall.getOperands().get(0);
+          RexNode opB = rexCall.getOperands().get(1);
+          g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall);
+        } else if (kind == cmpForward.reverse()) {
+          RexNode opA = rexCall.getOperands().get(1);
+          RexNode opB = rexCall.getOperands().get(0);
+          g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall);
+        }
+      }
+      return g;
+    }
+  }
+
+  /**
+   * Transforms OR clauses into IN clauses, when possible.
+   */
+  public static class RexTransformIntoInClause extends RexShuttle {
+    private final RexBuilder rexBuilder;
+    private final int minNumORClauses;
+
+    public RexTransformIntoInClause(RexBuilder rexBuilder, int 
minNumORClauses) {
+      this.rexBuilder = rexBuilder;
+      this.minNumORClauses = minNumORClauses;
+    }
+
+    @Override
+    public RexNode visitCall(RexCall inputCall) {
+      RexNode node = super.visitCall(inputCall);
+      if (node instanceof RexCall) {
+        RexCall call = (RexCall) node;
+        if (call.getKind() == SqlKind.OR) {
+          try {
+            RexNode newNode = transformIntoInClauseCondition(rexBuilder,
+                call, minNumORClauses);
+            if (newNode != null) {
+              return newNode;
+            }
+          } catch (SemanticException e) {
+            LOG.error("Exception in HivePointLookupOptimizerRule", e);
+            return call;
+          }
+        }
+      }
+      return node;
+    }
+
+    /**
+     * A group of Constraints.
+     *
+     * Examples:
+     *  (a=1 && b=1)
+     *  (a=1)
+     *
+     * Note: any rexNode is accepted as constraint; but it might be keyed with 
the empty key;
+     * which means it can't be parsed as a constraint for some reason; but for 
completeness...
+     *
+     */
+    static class ConstraintGroup {
+      private final Map<RexNodeRef, Constraint> constraints = new HashMap<>();
+      private final RexNode originalRexNode;
+      private final Set<RexNodeRef> key;
+
+      public ConstraintGroup(RexNode rexNode) {
+        originalRexNode = rexNode;
+
+        final List<RexNode> conjunctions = RelOptUtil.conjunctions(rexNode);
+
+        for (RexNode n : conjunctions) {
+
+          Constraint c = Constraint.of(n);
+          if (c == null) {
+            // interpretation failed; make this node opaque
+            key = Collections.emptySet();
+            return;
+          }
+          constraints.put(c.getKey(), c);
+        }
+        if (constraints.size() != conjunctions.size()) {
+          LOG.debug("unexpected situation; giving up on this branch");
+          key = Collections.emptySet();
+          return;
+        }
+        key = constraints.keySet();
+      }
+
+      public List<RexNode> getValuesInOrder(List<RexNodeRef> columns) throws 
SemanticException {
+        List<RexNode> ret = new ArrayList<>();
+        for (RexNodeRef rexInputRef : columns) {
+          Constraint constraint = constraints.get(rexInputRef);
+          if (constraint == null) {
+            throw new SemanticException("Unable to find constraint which was 
earlier added.");
+          }
+          ret.add(constraint.getValue());
+        }
+        return ret;
+      }
+    }
+
+    private RexNode transformIntoInClauseCondition(RexBuilder rexBuilder, 
RexNode condition,
+                                                   int minNumORClauses) throws 
SemanticException {
+      assert condition.getKind() == SqlKind.OR;
+
+      ImmutableList<RexNode> operands = RexUtil.flattenOr(((RexCall) 
condition).getOperands());
+      if (operands.size() < minNumORClauses) {
+        // We bail out
+        return null;
+      }
+      List<ConstraintGroup> allNodes = new ArrayList<>();
+      List<ConstraintGroup> processedNodes = new ArrayList<>();
+      for (int i = 0; i < operands.size(); i++) {
+        ConstraintGroup m = new ConstraintGroup(operands.get(i));
+        allNodes.add(m);
+      }
+
+      Multimap<Set<RexNodeRef>, ConstraintGroup> assignmentGroups =
+          Multimaps.index(allNodes, cg -> cg.key);
+
+      for (Map.Entry<Set<RexNodeRef>, Collection<ConstraintGroup>> sa : 
assignmentGroups.asMap().entrySet()) {
+        // skip opaque
+        if (sa.getKey().isEmpty()) {
+          continue;
+        }
+        // not enough equalities should not be handled
+        if (sa.getValue().size() < 2 || sa.getValue().size() < 
minNumORClauses) {
+          continue;
+        }
+
+        allNodes.add(new ConstraintGroup(buildInFor(sa.getKey(), 
sa.getValue())));
+        processedNodes.addAll(sa.getValue());
+      }
+
+      if (processedNodes.isEmpty()) {
+        return null;
+      }
+      allNodes.removeAll(processedNodes);
+      List<RexNode> ops = new ArrayList<>();
+      for (ConstraintGroup mx : allNodes) {
+        ops.add(mx.originalRexNode);
+      }
+      if (ops.size() == 1) {
+        return ops.get(0);
+      } else {
+        return rexBuilder.makeCall(SqlStdOperatorTable.OR, ops);
+      }
+
+    }
+
+    private RexNode buildInFor(Set<RexNodeRef> set, 
Collection<ConstraintGroup> value) throws SemanticException {
+
+      List<RexNodeRef> columns = new ArrayList<>(set);
+      columns.sort(RexNodeRef.COMPARATOR);
+      List<RexNode >operands = new ArrayList<>();
+
+      List<RexNode> columnNodes = 
columns.stream().map(RexNodeRef::getRexNode).collect(Collectors.toList());
+      operands.add(useStructIfNeeded(columnNodes));
+      for (ConstraintGroup node : value) {
+        List<RexNode> values = node.getValuesInOrder(columns);
+        operands.add(useStructIfNeeded(values));
+      }
+
+      return rexBuilder.makeIn(operands.get(0), operands.subList(1, 
operands.size()));
+    }
+
+    private RexNode useStructIfNeeded(List<? extends RexNode> columns) {
+      // Create STRUCT clause
+      if (columns.size() == 1) {
+        return columns.get(0);
+      } else {
+        return rexBuilder.makeCall(SqlStdOperatorTable.ROW, columns);
+      }
+    }
+
+  }
+
+  public static RexNode transformOrToInAndInequalityToBetween(
+      RexBuilder rexBuilder, RexNode condition, int minNumORClauses) {
+    // 1. We try to transform possible candidates
+    RexTransformIntoInClause transformIntoInClause = new 
RexTransformIntoInClause(rexBuilder, minNumORClauses);
+    RexNode newCondition = transformIntoInClause.apply(condition);
+
+    // 2. We merge IN expressions
+    RexMergeInClause mergeInClause = new RexMergeInClause(rexBuilder);
+    newCondition = mergeInClause.apply(newCondition);
+
+    // 3. Close BETWEEN expressions if possible
+    RexTransformIntoBetween t = new RexTransformIntoBetween(rexBuilder);
+    newCondition = t.apply(newCondition);
+    return newCondition;
+  }

Review Comment:
   Logged https://issues.apache.org/jira/browse/HIVE-28907 for reworking the 
rule after the intro of the SEARCH operator.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java:
##########
@@ -138,6 +141,17 @@ public boolean isSynthetic() {
 
   @Override
   public RelWriter explainTerms(RelWriter pw) {
+    // Remove this if block after upgrading Calcite to 1.35+

Review Comment:
   Not relevant with latest changes. Resolving.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java:
##########
@@ -128,6 +129,11 @@ public HiveJdbcImplementor(SqlDialect dialect, 
JavaTypeFactory typeFactory) {
     return result(join, leftResult, rightResult);
   }
 
+public Result visit(JdbcTableScan scan) {
+  return result(scan.jdbcTable.tableName(),
+      ImmutableList.of(Clause.FROM), scan, null);
+}
+

Review Comment:
   Since we are just restoring the breaking change introduced by CALCITE-4640 
introducing this method here makes sense.



##########
ql/src/test/results/clientpositive/perf/tpcds30tb/tez/cbo_query13.q.out:
##########
@@ -5,19 +5,19 @@ HiveProject(_o__c0=[/(CAST($0):DOUBLE, $1)], 
_o__c1=[CAST(/($2, $3)):DECIMAL(11,
       HiveJoin(condition=[=($1, $18)], joinType=[inner], algorithm=[none], 
cost=[not available])
         HiveJoin(condition=[AND(=($2, $14), OR(AND($15, $7), AND($16, $8), 
AND($17, $9)))], joinType=[inner], algorithm=[none], cost=[not available])
           HiveJoin(condition=[=($6, $13)], joinType=[inner], algorithm=[none], 
cost=[not available])
-            HiveProject(ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], 
ss_quantity=[$9], ss_ext_sales_price=[$14], ss_ext_wholesale_cost=[$15], 
ss_sold_date_sk=[$22], BETWEEN=[BETWEEN(false, $21, 100:DECIMAL(12, 2), 
200:DECIMAL(12, 2))], BETWEEN9=[BETWEEN(false, $21, 150:DECIMAL(12, 2), 
300:DECIMAL(12, 2))], BETWEEN10=[BETWEEN(false, $21, 50:DECIMAL(12, 2), 
250:DECIMAL(12, 2))], BETWEEN11=[BETWEEN(false, $12, 100:DECIMAL(3, 0), 
150:DECIMAL(3, 0))], BETWEEN12=[BETWEEN(false, $12, 50:DECIMAL(2, 0), 
100:DECIMAL(3, 0))], BETWEEN13=[BETWEEN(false, $12, 150:DECIMAL(3, 0), 
200:DECIMAL(3, 0))])
-              HiveFilter(condition=[AND(IS NOT NULL($3), IS NOT NULL($5), IS 
NOT NULL($4), IS NOT NULL($6), OR(<=(100:DECIMAL(3, 0), $12), <=($12, 
150:DECIMAL(3, 0)), <=(50:DECIMAL(2, 0), $12), <=($12, 100:DECIMAL(3, 0)), 
<=(150:DECIMAL(3, 0), $12), <=($12, 200:DECIMAL(3, 0))), OR(<=(100:DECIMAL(12, 
2), $21), <=($21, 200:DECIMAL(12, 2)), <=(150:DECIMAL(12, 2), $21), <=($21, 
300:DECIMAL(12, 2)), <=(50:DECIMAL(12, 2), $21), <=($21, 250:DECIMAL(12, 2))), 
IS NOT NULL($22))])
+            HiveProject(ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], 
ss_quantity=[$9], ss_ext_sales_price=[$14], ss_ext_wholesale_cost=[$15], 
ss_sold_date_sk=[$22], EXPR$0=[BETWEEN(false, $21, 100:DECIMAL(12, 2), 
200:DECIMAL(12, 2))], EXPR$1=[BETWEEN(false, $21, 150:DECIMAL(12, 2), 
300:DECIMAL(12, 2))], EXPR$2=[BETWEEN(false, $21, 50:DECIMAL(12, 2), 
250:DECIMAL(12, 2))], EXPR$5=[BETWEEN(false, $12, 100:DECIMAL(3, 0), 
150:DECIMAL(3, 0))], EXPR$8=[BETWEEN(false, $12, 50:DECIMAL(3, 0), 
100:DECIMAL(3, 0))], EXPR$11=[BETWEEN(false, $12, 150:DECIMAL(3, 0), 
200:DECIMAL(3, 0))])
+              HiveFilter(condition=[AND(IS NOT NULL($21), IS NOT NULL($12), IS 
NOT NULL($3), IS NOT NULL($5), IS NOT NULL($4), IS NOT NULL($6), IS NOT 
NULL($22))])

Review Comment:
   The disjnuction is simplified to IS NOT NULL as discussed under:
   
https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.n5m7m06tr31a



##########
iceberg/iceberg-handler/src/test/results/positive/update_iceberg_copy_on_write_partitioned.q.out:
##########
@@ -71,10 +71,10 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: tbl_ice
-                  filterExpr: (((b <> 'one') and (b <> 'four') and (a <> 22)) 
or ((a = 22) or (b) IN ('one', 'four')) is null) (type: boolean)
+                  filterExpr: (((b) IN ('four', 'one') or (a = 22)) is null or 
(((b < 'four') or ((b > 'four') and (b < 'one')) or (b > 'one')) and (a <> 
22))) (type: boolean)

Review Comment:
   This is a direct consequence of the fact that `<>` operator is simplified to 
`SEARCH`. When we expand SEARCH we end up with this inequality of ranges. The 
new expression is slightly less efficient than the previous one but probably we 
can leave with it.
   
   I logged https://issues.apache.org/jira/browse/HIVE-28911 as a follow-up 
improvement. It is also discussed here: 
https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.nfxo4n2hxorj



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java:
##########
@@ -683,6 +687,20 @@ public static ImmutableList<RexNode> 
getPredsNotPushedAlready(Set<String> predic
     predicatesToExclude.addAll(predicatesInSubtree);
     return newConjuncts.build();
   }
+  
+  private static RexNode simplify(RexSimplify simplifier, RexNode node) {
+    RexNode result = node;
+    int maxTries = 5;
+    for (int i = 0; i < maxTries; i++) {
+      RexNode simplified = simplifier.simplify(result);
+      if (simplified.equals(result)) {
+        break;
+      }
+      result = simplified;
+    }
+    
+    return result;

Review Comment:
   Fixed by 
https://github.com/apache/hive/pull/5196/commits/f24c0b9cc86248c231795db9f41c61c8aab5a139.
 Marking this as resolved.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -77,6 +77,7 @@
 import org.apache.calcite.util.mapping.Mappings;
 import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
 import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule;

Review Comment:
   Changes were reverted in latest commits so discussion not relevant anymore.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -869,6 +875,12 @@ public ASTNode visitInputRef(RexInputRef inputRef) {
     @Override
     public ASTNode visitLiteral(RexLiteral literal) {
 
+      if (!RexUtil.isNull(literal) && literal.getType().isStruct()) {
+        return rexBuilder
+            .makeCall(SqlStdOperatorTable.ROW, (List<? extends RexNode>) 
literal.getValue())
+            .accept(this);
+      }
+

Review Comment:
   With the latest changes ASTConverter is left completely unchanged so this 
discussion is not relevant anymore. SEARCH is expanded before hitting the 
ASTConverter.



##########
ql/src/test/results/clientpositive/llap/pcs.q.out:
##########
@@ -1286,16 +1293,18 @@ PREHOOK: type: QUERY
 PREHOOK: Input: default@pcs_t1
 PREHOOK: Input: default@pcs_t1@ds=2000-04-08
 PREHOOK: Input: default@pcs_t1@ds=2000-04-09
+PREHOOK: Input: default@pcs_t1@ds=2000-04-10
 #### A masked pattern was here ####
 POSTHOOK: query: explain extended select ds from pcs_t1 where struct(ds, key, 
rand(100)) in (struct('2000-04-08',1,0.2), struct('2000-04-09',2,0.3))
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@pcs_t1
 POSTHOOK: Input: default@pcs_t1@ds=2000-04-08
 POSTHOOK: Input: default@pcs_t1@ds=2000-04-09
+POSTHOOK: Input: default@pcs_t1@ds=2000-04-10
 #### A masked pattern was here ####
 OPTIMIZED SQL: SELECT `ds`
 FROM `default`.`pcs_t1`
-WHERE  (`ds`, `key`, RAND(100)) IN ( ('2000-04-08', 1, 0.2),  ('2000-04-09', 
2, 0.3))
+WHERE (`ds`, `key`, RAND(100)) = ('2000-04-08', 1, 0.2) OR (`ds`, `key`, 
RAND(100)) = ('2000-04-09', 2, 0.3)

Review Comment:
   I restored the test to its original state since latest commits fixed the 
problem with partition pruning. Marking this as resolved.



##########
ql/src/test/results/clientpositive/llap/pcs.q.out:
##########
@@ -1355,6 +1364,40 @@ STAGE PLANS:
               serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
           
+              input format: org.apache.hadoop.mapred.TextInputFormat
+              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+              properties:
+                bucketing_version 2
+                column.name.delimiter ,
+                columns key,value
+                columns.comments 
+                columns.types int:string
+#### A masked pattern was here ####
+                name default.pcs_t1
+                partition_columns ds
+                partition_columns.types string
+                serialization.format 1
+                serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+              name: default.pcs_t1
+            name: default.pcs_t1
+          Partition
+            input format: org.apache.hadoop.mapred.TextInputFormat
+            output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+            partition values:
+              ds 2000-04-10
+            properties:
+              column.name.delimiter ,
+              columns key,value
+              columns.types int:string
+#### A masked pattern was here ####
+              name default.pcs_t1
+              partition_columns ds
+              partition_columns.types string
+              serialization.format 1
+              serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+          

Review Comment:
   Latest commits fixed the problem with partition pruning so marking this as 
resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-27102: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 [hive]

Reply via email to