zabetak commented on code in PR #5196: URL: https://github.com/apache/hive/pull/5196#discussion_r2048510211
########## ql/src/test/queries/clientpositive/pointlookup3.q: ########## @@ -1,4 +1,6 @@ --! qt:dataset:src +-- SORT_QUERY_RESULTS + Review Comment: The sorting was reverted by latest commits so marking this as resolved. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java: ########## @@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) { switch (call.getKind()) { case IN: return transformIntoOrAndClause(rexBuilder, call); Review Comment: It's not redundant with the latest commits since SEARCH is expanded to IN, BETWEEN, etc., before hitting the JDBC rules. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java: ########## @@ -198,6 +208,12 @@ public Double visitCall(RexCall call) { return null; } + if (call.getKind() == SqlKind.SEARCH) { + RexCall expandedCall = (RexCall) RexUtil.expandSearch(rexBuilder, null, call); + + return visitCall(expandedCall); + } + Review Comment: In the end, we adopted an expansion approach so SEARCH is expanded before hitting this part. The expansion still creates IN and BETWEEN so the rule remains mostly as it was before. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + RexBuilder rexBuilder = childRel.getCluster().getRexBuilder(); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // TODO: revisit this to better handle INs + if (sarg.isPoints()) { + selectivity = computeFunctionSelectivity(call); + if (selectivity != null) { + selectivity = selectivity * sarg.pointCount; + if (selectivity <= 0.0) { + selectivity = 0.10; + } else if (selectivity >= 1.0) { + selectivity = 1.0; + } + } + break; + + } else { + return visitCall((RexCall) + transformOrToInAndInequalityToBetween( Review Comment: With the latest changes the expansion is done before hitting the estimator and this class remains mostly unchanged. ########## ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java: ########## @@ -508,8 +485,16 @@ public void testComputeRangePredicateSelectivityBetweenWithNULLS() { public void testComputeRangePredicateSelectivityNotBetweenWithNULLS() { doReturn((double) 20).when(tableMock).getRowCount(); doReturn(Collections.singletonList(stats)).when(tableMock).getColStat(Collections.singletonList(0)); - RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.BETWEEN, boolTrue, inputRef0, int1, int3); + RexNode filter = makeNotBetween(inputRef0, int1, int3); FilterSelectivityEstimator estimator = new FilterSelectivityEstimator(scan, mq); Assert.assertEquals(0.55, estimator.estimateSelectivity(filter), DELTA); } + + private RexNode makeNotBetween(RexNode inputRef, RexNode left, RexNode right) { + RexNode notBetween = REX_BUILDER + .makeCall(SqlStdOperatorTable.NOT, REX_BUILDER.makeBetween(inputRef, left, right)); + RexSimplify simplify = new RexSimplify(REX_BUILDER, RelOptPredicateList.EMPTY, RexUtil.EXECUTOR); + + return simplify.simplify(notBetween); Review Comment: The call to simplify is strange. What exactly do we want to test here? Can't we create directly the simplified expression? ########## ql/src/test/queries/clientpositive/search_nullability.q: ########## Review Comment: Removed https://github.com/apache/hive/pull/5196/commits/56965963bf6bee744a6d3a6ef9bf1caf60e4f411. ########## ql/src/test/queries/clientpositive/search.q: ########## Review Comment: Removed in https://github.com/apache/hive/pull/5196/commits/56965963bf6bee744a6d3a6ef9bf1caf60e4f411. If we are not adding extra coverage then we don't need these tests. We can always add more tests if necessary even after the PR is merged. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java: ########## @@ -179,7 +178,7 @@ protected HivePointLookupOptimizerRule( this.minNumORClauses = minNumORClauses; } - public RexNode analyzeRexNode(RexBuilder rexBuilder, RexNode condition) { + public static RexNode analyzeRexNode(RexBuilder rexBuilder, RexNode condition, int minNumORClauses) { Review Comment: All these changes to HivePointLookupOptimizerRule were reverted so marking this as resolved. ########## ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java: ########## @@ -508,8 +492,19 @@ public void testComputeRangePredicateSelectivityBetweenWithNULLS() { public void testComputeRangePredicateSelectivityNotBetweenWithNULLS() { doReturn((double) 20).when(tableMock).getRowCount(); doReturn(Collections.singletonList(stats)).when(tableMock).getColStat(Collections.singletonList(0)); - RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.BETWEEN, boolTrue, inputRef0, int1, int3); + RexNode filter = makeNotBetween(inputRef0, int1, int3); FilterSelectivityEstimator estimator = new FilterSelectivityEstimator(scan, mq); Assert.assertEquals(0.55, estimator.estimateSelectivity(filter), DELTA); } + + private RexNode makeNotBetween(RexNode inputRef, RexNode left, RexNode right) { + RexNode withOr = REX_BUILDER.makeCall( + SqlStdOperatorTable.OR, + REX_BUILDER.makeCall(SqlStdOperatorTable.LESS_THAN, inputRef, left), + REX_BUILDER.makeCall(SqlStdOperatorTable.GREATER_THAN, inputRef, right) + ); + RexSimplify simplify = new RexSimplify(REX_BUILDER, RelOptPredicateList.EMPTY, RexUtil.EXECUTOR); + + return simplify.simplify(withOr); Review Comment: The changes were reverted by https://github.com/apache/hive/pull/5196/commits/1c215e52ce820ae77128a6eac45fa89f63638729. The unit tests are in place to ensure that the code that is handling the SqlKind.BETWEEN operator in FilterSelectivityEstimator works and has a certain behavior. From the moment that the SqlKind.BETWEEN logic is still there it's not appropriate to modify the tests. If we want to test that RexBuilder#makeBetween generates an expression with the same selectivity then it should be done as another test but its probably outside the scope of the upgrade. ########## ql/src/test/queries/clientpositive/cardinality_preserving_join_opt2.q: ########## @@ -2,6 +2,7 @@ set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.stats.fetch.column.stats=true; +set hive.cbo.rule.exclusion.regex=HivePreFilteringRule; Review Comment: Problem was fixed by f24c0b9cc86248c231795db9f41c61c8aab5a139 and the changes to the test were reverted. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: Review Comment: It's not dead code based on the latest approach where we continue to use IN and BETWEEN. ########## ql/src/test/results/clientpositive/llap/explainuser_4.q.out: ########## @@ -119,24 +119,24 @@ Stage-0 PARTITION_ONLY_SHUFFLE [RS_11] Group By Operator [GBY_10] (rows=1 width=8) Output:["_col0"],aggregations:["count()"] - Merge Join Operator [MERGEJOIN_29] (rows=4626 width=8) + Merge Join Operator [MERGEJOIN_29] (rows=3449 width=8) Conds:RS_32._col0=RS_35._col0(Inner) <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_32] PartitionCols:_col0 - Select Operator [SEL_31] (rows=3078 width=2) + Select Operator [SEL_31] (rows=2297 width=2) Output:["_col0"] - Filter Operator [FIL_30] (rows=3078 width=2) - predicate:cint BETWEEN 1000000 AND 3000000 + Filter Operator [FIL_30] (rows=2297 width=2) + predicate:(cint is not null and cint BETWEEN 1000000 AND 3000000) TableScan [TS_0] (rows=12288 width=2) default@alltypesorc,a,Tbl:COMPLETE,Col:COMPLETE,Output:["cint"] <-Map 4 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_35] PartitionCols:_col0 - Select Operator [SEL_34] (rows=2298 width=2) + Select Operator [SEL_34] (rows=1715 width=2) Output:["_col0"] - Filter Operator [FIL_33] (rows=2298 width=8) - predicate:(cint BETWEEN 1000000 AND 3000000 and cbigint is not null) + Filter Operator [FIL_33] (rows=1715 width=8) + predicate:(cint is not null and cint BETWEEN 1000000 AND 3000000 and cbigint is not null) Review Comment: Part of the redundancy is also due to Hive so I logged https://issues.apache.org/jira/browse/HIVE-28910. I don't think this is blocking for the PR so resolving this conversation. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + RexBuilder rexBuilder = childRel.getCluster().getRexBuilder(); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // TODO: revisit this to better handle INs Review Comment: Not relevant after latest changes. Resolving conversation. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java: ########## @@ -80,7 +82,12 @@ private HivePushdownSnapshotFilterRule(Config config) { @Override public void onMatch(RelOptRuleCall call) { HiveFilter filter = call.rel(0); - RexNode newCondition = filter.getCondition().accept(new SnapshotIdShuttle(call.builder().getRexBuilder(), call.getMetadataQuery(), filter)); + RexNode expandedCondition = RexUtil.expandSearch( + filter.getCluster().getRexBuilder(), null, filter.getCondition() + ); Review Comment: The problem was addressed by two changes. Expanding SEARCH via another rule in the same phase and also avoiding Sarg literals by changing the type check (https://github.com/apache/hive/pull/5196/commits/d7e97fde183ee14957665fd8d45a15941f09106e). ########## ql/src/test/queries/clientpositive/pcs.q: ########## @@ -58,7 +58,7 @@ SORT BY A.key, A.value, A.ds; explain extended select ds from pcs_t1 where struct(case when ds='2000-04-08' then 10 else 20 end) in (struct(10),struct(11)); select ds from pcs_t1 where struct(case when ds='2000-04-08' then 10 else 20 end) in (struct(10),struct(11)); -explain extended select ds from pcs_t1 where struct(ds, key, rand(100)) in (struct('2000-04-08',1,0.2), struct('2000-04-09',2,0.3)); +explain extended select ds from pcs_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); Review Comment: Latest commits fixed the problem with partition pruning so we don't need a JIRA ticket for now. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java: ########## @@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) { switch (call.getKind()) { case IN: return transformIntoOrAndClause(rexBuilder, call); + case SEARCH: + return expandSearchAndRemoveRowOperator(rexBuilder, call); default: break; } } return RexUtil.isFlat(node) ? node : RexUtil.flatten(rexBuilder, node); } + + private RexNode expandSearchAndRemoveRowOperator(RexBuilder rexBuilder, RexCall expression) { + RexLiteral literal = (RexLiteral) expression.getOperands().get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + + if (sarg.isPoints()) { + // if it is a row operator, we'll have + // SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]] Review Comment: I addressed the problem slightly differently to avoid changing the rule too much. SEARCH is expanded to IN/BETWEEN/etc., before launching the JDBC rules (https://github.com/apache/hive/pull/5196/commits/98f809687a3054b15f1ccf721e52e8e634d43d02). Some changes were necessary in RexNodeConverter to handle the STRUCT literals that don't explicitly contain a ROW operator. ########## ql/src/test/results/clientpositive/llap/cbo_aggregate_reduce_functions_rule.q.out: ########## @@ -158,7 +158,7 @@ POSTHOOK: type: QUERY POSTHOOK: Input: default@test #### A masked pattern was here #### CBO PLAN: -HiveProject(_o__c0=[COALESCE($0, 0E0:DOUBLE)], _o__c1=[COALESCE($1, 0E0:DOUBLE)], _o__c2=[COALESCE($2, 0E0:DOUBLE)], _o__c3=[COALESCE($3, 0E0:DOUBLE)], _o__c4=[COALESCE($4, 0E0:DOUBLE)], _o__c5=[COALESCE($5, 0E0:DOUBLE)], _o__c6=[/($6, $7)], _o__c7=[/($1, $8)], _o__c8=[/($9, $10)], _o__c9=[/($3, $11)], _o__c10=[/($12, $13)], _o__c11=[/($5, $14)], _o__c12=[POWER(/(-($15, /(*($6, $6), $7)), $7), 0.5:DECIMAL(2, 1))], _o__c13=[POWER(/(-($16, /(*($1, $1), $8)), $8), 0.5:DECIMAL(2, 1))], _o__c14=[POWER(/(-($17, /(*($9, $9), $10)), $10), 0.5:DECIMAL(2, 1))], _o__c15=[POWER(/(-($18, /(*($3, $3), $11)), $11), 0.5:DECIMAL(2, 1))], _o__c16=[POWER(/(-($19, /(*($12, $12), $13)), $13), 0.5:DECIMAL(2, 1))], _o__c17=[POWER(/(-($20, /(*($5, $5), $14)), $14), 0.5:DECIMAL(2, 1))], _o__c18=[POWER(/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 1))), 0.5:DECIMAL(2, 1))], _o__c19=[POWER(/(-($16, /(*($1, $1), $8)), CASE(=($8, 1), null:BIGINT, -($8, 1))), 0.5:DECIMAL(2, 1))], _o__c20=[POWER(/ (-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 1))), 0.5:DECIMAL(2, 1))], _o__c21=[POWER(/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, -($11, 1))), 0.5:DECIMAL(2, 1))], _o__c22=[POWER(/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 1))), 0.5:DECIMAL(2, 1))], _o__c23=[POWER(/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, -($14, 1))), 0.5:DECIMAL(2, 1))], _o__c24=[/(-($15, /(*($6, $6), $7)), $7)], _o__c25=[/(-($16, /(*($1, $1), $8)), $8)], _o__c26=[/(-($17, /(*($9, $9), $10)), $10)], _o__c27=[/(-($18, /(*($3, $3), $11)), $11)], _o__c28=[/(-($19, /(*($12, $12), $13)), $13)], _o__c29=[/(-($20, /(*($5, $5), $14)), $14)], _o__c30=[/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 1)))], _o__c31=[/(-($16, /(*($1, $1), $8)), CASE(=($8, 1), null:BIGINT, -($8, 1)))], _o__c32=[/(-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 1)))], _o__c33=[/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, -($11, 1))) ], _o__c34=[/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 1)))], _o__c35=[/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, -($14, 1)))], _o__c36=[$0], _o__c37=[$1], _o__c38=[$2], _o__c39=[$3], _o__c40=[$4], _o__c41=[$5], _o__c42=[$21], _o__c43=[$8], _o__c44=[$22], _o__c45=[$11], _o__c46=[$23], _o__c47=[$14]) +HiveProject(_o__c0=[CAST(COALESCE($0, 0E0:DOUBLE)):DOUBLE], _o__c1=[CAST(COALESCE($1, 0E0:DOUBLE)):DOUBLE], _o__c2=[CAST(COALESCE($2, 0E0:DOUBLE)):DOUBLE], _o__c3=[CAST(COALESCE($3, 0E0:DOUBLE)):DOUBLE], _o__c4=[CAST(COALESCE($4, 0E0:DOUBLE)):DOUBLE], _o__c5=[CAST(COALESCE($5, 0E0:DOUBLE)):DOUBLE], _o__c6=[/($6, $7)], _o__c7=[/($1, $8)], _o__c8=[/($9, $10)], _o__c9=[/($3, $11)], _o__c10=[/($12, $13)], _o__c11=[/($5, $14)], _o__c12=[POWER(/(-($15, /(*($6, $6), $7)), $7), 0.5:DECIMAL(2, 1))], _o__c13=[POWER(/(-($16, /(*($1, $1), $8)), $8), 0.5:DECIMAL(2, 1))], _o__c14=[POWER(/(-($17, /(*($9, $9), $10)), $10), 0.5:DECIMAL(2, 1))], _o__c15=[POWER(/(-($18, /(*($3, $3), $11)), $11), 0.5:DECIMAL(2, 1))], _o__c16=[POWER(/(-($19, /(*($12, $12), $13)), $13), 0.5:DECIMAL(2, 1))], _o__c17=[POWER(/(-($20, /(*($5, $5), $14)), $14), 0.5:DECIMAL(2, 1))], _o__c18=[POWER(/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 1))), 0.5:DECIMAL(2, 1))], _o__c19=[POWER(/(-($16, /(*($1, $1), $8)), CASE(=($8, 1), null:BIGINT, -($8, 1))), 0.5:DECIMAL(2, 1))], _o__c20=[POWER(/(-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 1))), 0.5:DECIMAL(2, 1))], _o__c21=[POWER(/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, -($11, 1))), 0.5:DECIMAL(2, 1))], _o__c22=[POWER(/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 1))), 0.5:DECIMAL(2, 1))], _o__c23=[POWER(/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, -($14, 1))), 0.5:DECIMAL(2, 1))], _o__c24=[/(-($15, /(*($6, $6), $7)), $7)], _o__c25=[/(-($16, /(*($1, $1), $8)), $8)], _o__c26=[/(-($17, /(*($9, $9), $10)), $10)], _o__c27=[/(-($18, /(*($3, $3), $11)), $11)], _o__c28=[/(-($19, /(*($12, $12), $13)), $13)], _o__c29=[/(-($20, /(*($5, $5), $14)), $14)], _o__c30=[/(-($15, /(*($6, $6), $7)), CASE(=($7, 1), null:BIGINT, -($7, 1)))], _o__c31=[/(-($16, /(*($1, $1), $8)), CASE(=($8, 1), null:BIGINT, -($8, 1)))], _o__c32=[/(-($17, /(*($9, $9), $10)), CASE(=($10, 1), null:BIGINT, -($10, 1)))], _o__c33=[/(-($18, /(*($3, $3), $11)), CASE(=($11, 1), null:BIGINT, -($11, 1)))], _o__c34=[/(-($19, /(*($12, $12), $13)), CASE(=($13, 1), null:BIGINT, -($13, 1)))], _o__c35=[/(-($20, /(*($5, $5), $14)), CASE(=($14, 1), null:BIGINT, -($14, 1)))], _o__c36=[$0], _o__c37=[$1], _o__c38=[$2], _o__c39=[$3], _o__c40=[$4], _o__c41=[$5], _o__c42=[$21], _o__c43=[$8], _o__c44=[$22], _o__c45=[$11], _o__c46=[$23], _o__c47=[$14]) Review Comment: Why do we have extra `CAST` around the `COALESCE` call? IT seems that the COALESCE returns a double so the CAST seems redundant. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // convert Sarg to IN when they are points. + if (sarg.isPoints()) { Review Comment: There was an attempt to produce NOT IN using HiveReduceSearchComplexityRule.java introduced by @soumyakanti3578 but later it was removed by https://github.com/apache/hive/pull/5196/commits/af781f834feb39cd41a06c392f97f3c6969acd84 The fact that NOT IN is not used is fine cause it never appeared in the plans (.q.out files) before the upgrade. Nevertheless, the plan before the upgrade was slightly more performant in terms of CPU since there were less comparisons and operators involved. We can possibly improve the performance by fixing https://issues.apache.org/jira/browse/HIVE-28911 but this can wait a bit. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1067,9 +1079,17 @@ public ASTNode visitCall(RexCall call) { return SqlFunctionConverter.buildAST(SqlStdOperatorTable.NOT, Collections.singletonList(SqlFunctionConverter.buildAST(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM, astNodeLst, call.getType())), call.getType()); case CAST: - assert(call.getOperands().size() == 1); + if (call.getOperands().size() != 1) { + throw new IllegalArgumentException("CASTs should have only 1 operand"); + } + RexNode castOperand = call.getOperands().get(0); + + // Extract RexNode out of CAST when it's not a literal and the types are equal + if (!(castOperand instanceof RexLiteral) && call.getType().equals(castOperand.getType())) { + return castOperand.accept(this); + } Review Comment: This was indeed a bug/regression in Calcite. It was fixed by https://issues.apache.org/jira/browse/CALCITE-6954 and we patched Hive using https://github.com/apache/hive/pull/5196/commits/d8d7ebda6cdc3b2bf456c99fabe16bee6e1032b1 ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java: ########## @@ -220,6 +234,37 @@ public Double visitCall(RexCall call) { return cost + functionCost(call); } + + private Double getSearchCost(RexCall call) { + HiveCalciteUtil.SearchTransformer<Double> transformer = new HiveCalciteUtil.SearchTransformer<>(); + transformer.transform(rexBuilder, call, + new RexFunctionCost(this.rexBuilder) { + @Override + public Double visitLiteral(RexLiteral literal) { + return HiveRelMdSize.INSTANCE.typeValueSize(literal.getType(), literal.getValueAs(Comparable.class)); + } + + @Override + public Double visitInputRef(RexInputRef inputRef) { + return HiveRelMdSize.INSTANCE.averageTypeValueSize(inputRef.getType()); + } + }); + + List<Double> operandCosts = new ArrayList<>(); + + if (!transformer.inNodes.isEmpty()) { + Double inOperandCosts = transformer.inNodes.stream().filter(Objects::nonNull).reduce(1D, Double::sum); + Double inFunctionCost = 2d * (transformer.inNodes.size() - 1); + operandCosts.add(inOperandCosts + inFunctionCost); + } + operandCosts.addAll(transformer.nodes); + + Double searchOperandCost = operandCosts.stream().filter(Objects::nonNull).reduce(1D, Double::sum); + // functionCost is essentially the cost of conjunctions or disjunctions + Double functionCost = operandCosts.size() == 1 ? 0D : 1d * operandCosts.size(); + + return searchOperandCost + functionCost; + } Review Comment: Instead of this transformation logic it may be OK to extend the `org.apache.calcite.rel.metadata.RelMdSize#typeValueSize` method to be able to handle literals of SARG type which could be useful in other places where we might end up calling the same logic. This could be also something to contribute to the CALCITE repo. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); Review Comment: All changes to ASTConverter were reverted so this discussion is obsolete. Still there is a different behavior cause when we expand SEARCH we will always generate IN and never OR irrespective of thresholds but I would argue that this is what should happen. ########## iceberg/iceberg-handler/src/test/results/positive/llap/llap_iceberg_read_orc.q.out: ########## @@ -359,11 +359,11 @@ STAGE PLANS: Map Operator Tree: TableScan alias: o - filterExpr: ((quantity NOT BETWEEN 39 AND 0 or quantity NOT BETWEEN 69 AND 39 or (quantity > 70)) and (((quantity > 0) and (quantity < 39)) or ((quantity > 39) and (quantity < 69)) or (quantity > 70)) and itemid is not null) (type: boolean) + filterExpr: ((((quantity > 0) and (quantity < 39)) or ((quantity > 39) and (quantity < 69)) or (quantity > 70)) and quantity is not null and itemid is not null) (type: boolean) probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_29_container, bigKeyColName:itemid, smallTablePos:1, keyRatio:0.9523809523809523 Review Comment: Copying below the message from https://github.com/apache/hive/pull/5196/commits/af781f834feb39cd41a06c392f97f3c6969acd84 A. NOT BETWEEN disappears completely and is transformed to disjunction with range predicates. ``` not hr BETWEEN 12 AND 14 => (hr < 12) or (hr > 14) ``` This is a net positive change. The new version is equally efficient and easier to handle since it relies on primitive operators that are handled better in Calcite and Hive. Since NOT BETWEEN disappears more code could be potentially dropped but it is outside the scope of this commit and upgrade. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java: ########## @@ -92,6 +91,15 @@ public HiveJdbcImplementor(SqlDialect dialect, JavaTypeFactory typeFactory) { sqlCondition); return result(join, leftResult, rightResult); } + + @Override + public Result visit(TableScan scan) { + if (!(scan instanceof JdbcTableScan)) { + return super.visit(scan); + } + + return result(((JdbcTableScan) scan).jdbcTable.tableName(), ImmutableList.of(Clause.FROM), scan, null); + } Review Comment: I still don't see why we need to override the default behavior. What errors are we getting if we don't add this logic? ########## iceberg/iceberg-handler/src/test/results/positive/query_iceberg_metadata_of_unpartitioned_table.q.out: ########## Review Comment: Hopefully it will be fixed by https://github.com/apache/hive/pull/5196/commits/54adb497ed7fe89e0e7d64a16cec1fc513dcc174 ########## ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java: ########## @@ -292,7 +293,11 @@ protected RelNode applyMaterializedViewRewriting(RelOptPlanner planner, RelNode basePlan = executeProgram(basePlan, program.build(), mdProvider, executorProvider, singletonList(materialization)); program = new HepProgramBuilder(); - generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST, HivePushdownSnapshotFilterRule.INSTANCE); + generatePartialProgram(program, + false, + HepMatchOrder.DEPTH_FIRST, + HiveSearchRules.FILTER_SEARCH_EXPAND, Review Comment: We do the expansion on the filter only to allow the `HivePushdownSnapshotFilterRule` to trigger and work as before. The pushdown rule itself may re-introduce a SEARCH so at this point we don't strive to eliminate it from the plan. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java: ########## @@ -72,6 +78,22 @@ public HiveFilter(RelOptCluster cluster, RelTraitSet traits, RelNode child, RexN super(cluster, TraitsUtil.getDefaultTraitSet(cluster), child, condition); this.correlationInfos = new CorrelationInfoSupplier(getCondition()); } + + @Override + public RelWriter explainTerms(RelWriter pw) { + // Remove this method after upgrading Calcite to 1.35+ Review Comment: Changes were reverted in latest commits so this is not relevant anymore. Resolving the conversation. ########## iceberg/iceberg-handler/src/test/results/positive/delete_iceberg_mixed.q.out: ########## @@ -84,17 +84,17 @@ Stage-4 <-Reducer 2 [CONTAINS] File Output Operator [FS_46] table:{"name:":"default.ice01"} - Select Operator [SEL_44] (rows=3 width=295) + Select Operator [SEL_44] (rows=3 width=266) Output:["_col0","_col1","_col2","_col3","_col4","_col5"] - Merge Join Operator [MERGEJOIN_43] (rows=3 width=295) + Merge Join Operator [MERGEJOIN_43] (rows=3 width=266) Conds:RS_57._col4=RS_63._col0(Left Semi),Output:["_col0","_col1","_col2","_col3","_col4","_col5"] <-Map 1 [SIMPLE_EDGE] vectorized SHUFFLE [RS_57] PartitionCols:_col4 - Select Operator [SEL_55] (rows=5 width=295) + Select Operator [SEL_55] (rows=6 width=280) Review Comment: Not relevant anymore since these stat changes are not present in latest patch. Resolving the conversation. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java: ########## @@ -656,7 +658,9 @@ public static ImmutableList<RexNode> getPredsNotPushedAlready(Set<String> predic } // Build map to not convert multiple times, further remove already included predicates Map<String,RexNode> stringToRexNode = Maps.newLinkedHashMap(); + RexSimplify simplify = new RexSimplify(inp.getCluster().getRexBuilder(), RelOptPredicateList.EMPTY, RexUtil.EXECUTOR); for (RexNode r : predsToPushDown) { + r = simplify.simplify(r); Review Comment: The StackOverflow is addressed by https://github.com/apache/hive/pull/5196/commits/f24c0b9cc86248c231795db9f41c61c8aab5a139 making the HivePreFilteringRule perform reduction ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java: ########## @@ -268,6 +270,7 @@ private RexNode convert(ExprNodeGenericFuncDesc func) throws SemanticException { // except complex types // Rewrite to OR is done only if number of operands are less than // the threshold configured + // TODO Is this rewrite unconditional? Review Comment: All TODOs need to be removed before merging. We should create JIRAs if there are remaining tasks to follow-up. (https://github.com/apache/hive/pull/5196/commits/612caceddf24f149e30d8d118dcd55f920720eeb). ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java: ########## @@ -22,8 +22,6 @@ import org.apache.calcite.rel.RelNode; import org.apache.calcite.rel.RelShuttle; import org.apache.calcite.rel.core.Filter; -import org.apache.calcite.rel.core.Join; -import org.apache.calcite.rel.core.TableScan; Review Comment: Restored. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java: ########## @@ -138,6 +143,18 @@ public boolean isSynthetic() { @Override public RelWriter explainTerms(RelWriter pw) { + // Remove this if block after upgrading Calcite to 1.35+ + if (CALCITE_VERSION < 35 + && pw.getDetailLevel() == SqlExplainLevel.ALL_ATTRIBUTES && getDigest().contains("SEARCH")) { + return ((HiveProject) this.accept( + RexUtil.searchShuttle( + new RexBuilder(new JavaTypeFactoryImpl(new HiveTypeSystemImpl())), + null, + -1 + ) + )).explainTerms(pw); + } Review Comment: Changes were reverted in latest commits so marking this as resolved. ########## packaging/pom.xml: ########## @@ -173,6 +173,10 @@ \Qhttps://www.eclipse.org/org/documents/epl-v10.php\E \Qhttp://www.eclipse.org/org/documents/epl-v10.php\E </EPL-1.0> + <EPL-2.0> + \Qhttps://raw.githubusercontent.com/locationtech/jts/master/LICENSE_EPLv2.txt\E + \Qhttps://www.eclipse.org/org/documents/epl-2.0/EPL-2.0.txt\E + </EPL-2.0> Review Comment: As part of the upgrade we need to add an explicit dependency to jts-core. The respective jar will be packaged in the binary distribution of Hive so we need to have its license packaged as well. The changes here are necessary and useful to avoid duplicate licenses. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java: ########## Review Comment: All TODOs need to be removed before merging. We should create JIRAs if there are remaining tasks to follow-up. (https://github.com/apache/hive/pull/5196/commits/612caceddf24f149e30d8d118dcd55f920720eeb) ########## ql/src/test/results/clientpositive/llap/excluded_rule_explain.q.out: ########## Review Comment: https://issues.apache.org/jira/browse/CALCITE-6832 As said before not urgent. We can fix it after merging the PR. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // convert Sarg to IN when they are points. + if (sarg.isPoints()) { + // just expand SEARCH to ORs when point count is less than HIVE_POINT_LOOKUP_OPTIMIZER_MIN + if (sarg.pointCount < minOrClauses) { + return visitCall((RexCall) call.accept(RexUtil.searchShuttle(rexBuilder, null, -1))); + } + + // else convert to IN + for (Range<?> range : sarg.rangeSet.asRanges()) { + astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral( + range.lowerEndpoint(), literal.getType(), true, true))); + } + + return SqlFunctionConverter.buildAST(HiveIn.INSTANCE, astNodeLst, call.getType()); + // Expand SEARCH operator + } else { + return visitCall((RexCall) transformOrToInAndInequalityToBetween( + rexBuilder, + call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)), + minOrClauses + ) Review Comment: Now the transformation is done before hitting the AST converter and the logic is encapsulated in SearchTransformer. We can successfully convert to IN and BETWEEN but not on their negated counterparts. As discussed in other places it seems that we don't need to handle the NOT IN and NOT BETWEEN cases; at least not immediately. ########## ql/src/test/results/clientpositive/llap/cbo_join_transitive_pred_loop_1.q.out: ########## @@ -67,6 +67,6 @@ HiveProject(month=[$0], con_usd=[$2]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveProject(mth=[$0], con_usd=[$1]) - HiveFilter(condition=[IS NOT NULL($0)]) + HiveFilter(condition=[AND(IS NOT NULL($0), IN($0, 202110, 202503))]) HiveTableScan(table=[[default, test3]], table:alias=[d]) Review Comment: It seems correct to me. Isn't it the constant folding of the following expression? ```sql cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) ``` Notice that there is `add_months` with a `-1` argument. So if we take the current date and we subtract one month we go to March (03). ########## ql/src/test/results/clientpositive/llap/bucketpruning1.q.out: ########## @@ -1656,7 +1656,7 @@ POSTHOOK: type: QUERY POSTHOOK: Input: default@srcbucket_pruned #### A masked pattern was here #### OPTIMIZED SQL: SELECT * -FROM (VALUES (NULL, NULL, NULL)) AS `t` (`key`, `value`, `ds`) +FROM (SELECT NULL AS `key`, NULL AS `value`, NULL AS `ds`) AS `t` Review Comment: The before and after are equivalent and rather trivial so I don't think it matters. I guess we can always do the transformation when we have a single entry in values not matter if it is NULL or not. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlAggFunction.java: ########## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.functions; + +import org.apache.calcite.sql.SqlAggFunction; +import org.apache.calcite.sql.SqlFunctionCategory; +import org.apache.calcite.sql.SqlIdentifier; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.sql.type.SqlOperandTypeChecker; +import org.apache.calcite.sql.type.SqlOperandTypeInference; +import org.apache.calcite.sql.type.SqlReturnTypeInference; +import org.apache.calcite.util.Optionality; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelBuilder; +import org.checkerframework.checker.nullness.qual.Nullable; + +public class HiveSqlAggFunction extends SqlAggFunction { Review Comment: Instead of using HiveSqlAggFunction and passing from the RelBuilder an alternative approach has been committed in https://github.com/apache/hive/pull/5196/commits/c0469f75a50b2249ce8c8506e80c85a931394d63 to overcome the breaking changes in CALCITE-4342. CALCITE-4342 made rollup strategy a first class citizen of the SqlAggFunction interface so each function can return its own rollup if available. It's no longer necessary to pass from the RelBuilder and basically whenever we need to obtain a rollup we have to ask directly the SqlAggFunction. ########## ql/src/test/queries/clientpositive/sharedwork.q: ########## @@ -20,6 +20,24 @@ create table MY_TABLE_0001_01 ( col_1 string, col_100 string); +explain cbo SELECT + Table__323.col_7, + CAST(Table__323.col_3 AS DATE) col_3, + Table__323.col_20, + Table__1232.col_21 col_21_1232, + Table__323.col_1, + Table__133.col_22, + Table__879.col_21 col_21_879 + ,Table__133.col_23 +FROM MY_TABLE_0001 Table__323 +LEFT OUTER JOIN MY_TABLE_0003 Table__1232 ON (Table__323.col_20=Table__1232.col_24) +LEFT OUTER JOIN MY_TABLE_0001_00 Table__133 ON (Table__323.col_1=Table__133.col_1) +LEFT OUTER JOIN MY_TABLE_0003 Table__879 ON (Table__133.col_22=Table__879.col_24) +LEFT OUTER JOIN MY_TABLE_0001_01 Table__1215 ON (Table__323.col_1=Table__1215.col_1 and Table__1215.col_100 = 210) +WHERE 1=1 +AND (cast(Table__323.col_7 AS DOUBLE) IS NOT NULL OR Table__323.col_7 IS NULL) +AND CAST(Table__323.col_3 AS DATE) BETWEEN '2018-07-01' AND '2019-01-23' +AND Table__323.col_20 IN ('part1','part2','part3'); Review Comment: Changes were reverted so marking this as resolved. ########## ql/src/test/results/clientpositive/llap/bucketsortoptimize_insert_7.q.out: ########## @@ -95,10 +95,10 @@ STAGE PLANS: Map Operator Tree: TableScan alias: b - filterExpr: (key) IN (0, 5) (type: boolean) + filterExpr: ((key) IN (0, 5) and key is not null) (type: boolean) Statistics: Num rows: 84 Data size: 7896 Basic stats: COMPLETE Column stats: COMPLETE Review Comment: I think we can leave with some redundant NOT NULL predicates for a while. The performance impact should be minimal and probably not noticeable. However, it would be nice to address few of those so I logged https://issues.apache.org/jira/browse/HIVE-28910. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RangeConverter.java: ########## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite; + +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlOperator; +import org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.calcite.util.RangeSets; +import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween; + +import java.util.ArrayList; +import java.util.List; + +class RangeConverter<C extends Comparable<C>> implements RangeSets.Consumer<C> { + + protected final RexBuilder rexBuilder; + protected final RelDataType type; + protected final RexNode ref; + public final List<RexNode> inNodes; + public final List<RexNode> nodes; Review Comment: Since tests are now green, I prefer to avoid touching the code for minor stylistic issues. If more changes are needed I will fix this otherwise we can do it in a follow-up. ########## iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out: ########## @@ -150,27 +150,27 @@ Stage-0 File Output Operator [FS_61] Limit [LIM_60] (rows=20 width=447) Number of rows:20 - Select Operator [SEL_59] (rows=791 width=447) + Select Operator [SEL_59] (rows=473 width=447) Output:["_col0","_col1","_col2","_col3","_col4"] <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_58] - Top N Key Operator [TNK_57] (rows=791 width=447) + Top N Key Operator [TNK_57] (rows=473 width=447) keys:_col0,top n:20 - Map Join Operator [MAPJOIN_56] (rows=791 width=447) + Map Join Operator [MAPJOIN_56] (rows=473 width=447) BucketMapJoin:true,Conds:SEL_55._col0, _col1=RS_53._col0, _col1(Inner),Output:["_col0","_col1","_col2","_col3","_col4"] <-Map 3 [CUSTOM_EDGE] vectorized, llap MULTICAST [RS_53] PartitionCols:_col0, _col1 - Select Operator [SEL_52] (rows=500 width=178) + Select Operator [SEL_52] (rows=387 width=178) Output:["_col0","_col1"] - Filter Operator [FIL_51] (rows=500 width=178) - predicate:((key <> '0') and (key <> '100') and value is not null) + Filter Operator [FIL_51] (rows=387 width=178) + predicate:(((key < '0') or ((key > '0') and (key < '100')) or (key > '100')) and key is not null and value is not null) TableScan [TS_3] (rows=500 width=178) default@src,b,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"] - <-Select Operator [SEL_55] (rows=500 width=269) + <-Select Operator [SEL_55] (rows=387 width=269) Output:["_col0","_col1","_col2"] - Filter Operator [FIL_54] (rows=500 width=269) - predicate:((key1 <> '0') and (key1 <> '100') and key2 is not null) + Filter Operator [FIL_54] (rows=387 width=269) + predicate:(((key1 < '0') or ((key1 > '0') and (key1 < '100')) or (key1 > '100')) and key1 is not null and key2 is not null) TableScan [TS_0] (rows=500 width=269) Review Comment: Yes, it is discussed here: https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.nfxo4n2hxorj I also logged https://issues.apache.org/jira/browse/HIVE-28911 but I think its not blocking for merging the PR. ########## ql/src/test/results/clientpositive/llap/concat_op.q.out: ########## Review Comment: I guess we can leave with this problem for a little while. We don't have many consumers of the JSON format so it is unlikely to have much impact on end-users. The problem is already fixed in https://issues.apache.org/jira/browse/CALCITE-6832 so we can patch Hive shortly after this PR is merged. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java: ########## @@ -1177,4 +1185,633 @@ public static List<Integer> getNewRelDistributionKeys( } return distributionKeys; } + + /** + * This class just wraps around a RexNode enables equals/hashCode based on toString. + * After CALCITE-2632 this might not be needed anymore */ + public static class RexNodeRef { + + public static Comparator<RexNodeRef> COMPARATOR = + Comparator.comparing((RexNodeRef o) -> o.node.toString()); + private final RexNode node; + + public RexNodeRef(RexNode node) { + this.node = node; + } + + public RexNode getRexNode() { + return node; + } + + @Override + public int hashCode() { + return node.toString().hashCode(); + } + + @Override + public boolean equals(Object o) { + if (o instanceof RexNodeRef) { + RexNodeRef otherRef = (RexNodeRef) o; + return node.toString().equals(otherRef.node.toString()); + } + return false; + } + + @Override + public String toString() { + return "ref for:" + node.toString(); + } + } + + /** + * Represents a constraint. + * + * Example: a=1 + * substr(a,1,2) = concat('asd','xxx') + */ + public static class Constraint { + + private final RexNode exprNode; + private final RexNode constNode; + + public Constraint(RexNode exprNode, RexNode constNode) { + this.exprNode = exprNode; + this.constNode = constNode; + } + + /** + * Interprets argument as a constraint; if not possible returns null. + */ + public static Constraint of(RexNode n) { + if (!(n instanceof RexCall)) { + return null; + } + RexCall call = (RexCall) n; + if (call.getOperator().getKind() != SqlKind.EQUALS) { + return null; + } + RexNode opA = call.operands.get(0); + RexNode opB = call.operands.get(1); + if (RexUtil.isNull(opA) || RexUtil.isNull(opB)) { + // dont try to compare nulls + return null; + } + if (isConstExpr(opA) && isColumnExpr(opB)) { + return new Constraint(opB, opA); + } + if (isColumnExpr(opA) && isConstExpr(opB)) { + return new Constraint(opA, opB); + } + return null; + } + + private static boolean isColumnExpr(RexNode node) { + return !node.getType().isStruct() && !HiveCalciteUtil.getInputRefs(node).isEmpty() + && HiveCalciteUtil.isDeterministic(node); + } + + private static boolean isConstExpr(RexNode node) { + return !node.getType().isStruct() && HiveCalciteUtil.getInputRefs(node).isEmpty() + && HiveCalciteUtil.isDeterministic(node); + } + + public RexNodeRef getKey() { + return new RexNodeRef(exprNode); + } + + public RexNode getValue() { + return constNode; + } + + } + + /** + * Merge IN clauses, when possible. + */ + public static class RexMergeInClause extends RexShuttle { + private final RexBuilder rexBuilder; + + public RexMergeInClause(RexBuilder rexBuilder) { + this.rexBuilder = rexBuilder; + } + + @Override public RexNode visitCall(RexCall call) { + switch (call.getKind()) { + case AND: + return handleAND(rexBuilder, call); + case OR: + return handleOR(rexBuilder, call); + default: + return super.visitCall(call); + } + } + + private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // Visited nodes + final Set<RexNode> visitedRefs = new LinkedHashSet<>(); + // IN clauses need to be combined by keeping only common elements + final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List<RexNode> operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + + for (int i = 0; i < operands.size(); i++) { + RexNode operand = operands.get(i); + if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { + continue; + } + RexNode ref = inCall.getOperands().get(0); + visitedRefs.add(ref); + if (ref.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { + Set<SimilarRexNodeElement> expressions = Sets.newHashSet(); + for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(new SimilarRexNodeElement(constNode)); + if (constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, constNode); + } + } + Collection<SimilarRexNodeElement> knownConstants = inLHSExprToRHSExprs.get(ref); + if (!shareSameType(knownConstants, expressions)) { + return call; + } + knownConstants.retainAll(expressions); + } else { + for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + inLHSExprToRHSExprs.put(ref, new SimilarRexNodeElement(constNode)); + if (constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, constNode); + } + } + } + operands.remove(i); + --i; + } else if (operand.getKind() == SqlKind.EQUALS) { + Constraint c = Constraint.of(operand); + if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { + continue; + } + visitedRefs.add(c.exprNode); + if (c.exprNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(c.exprNode, c.exprNode); + } + if (c.constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(c.exprNode, c.constNode); + } + if (inLHSExprToRHSExprs.containsKey(c.exprNode)) { + Collection<SimilarRexNodeElement> knownConstants = inLHSExprToRHSExprs.get(c.exprNode); + Collection<SimilarRexNodeElement> nextConstant = Collections.singleton(new SimilarRexNodeElement(c.constNode)); + if (!shareSameType(knownConstants, nextConstant)) { + return call; + } + knownConstants.retainAll(nextConstant); + } else { + inLHSExprToRHSExprs.put(c.exprNode, new SimilarRexNodeElement(c.constNode)); + } + operands.remove(i); + --i; + } + } + // Create IN clauses + final List<RexNode> newOperands = createInClauses(rexBuilder, + visitedRefs, inLHSExprToRHSExprs, inLHSExprToRHSNullableExprs); + newOperands.addAll(operands); + // Return node + return RexUtil.composeConjunction(rexBuilder, newOperands, false); + } + + protected static class SimilarRexNodeElement { + private final RexNode rexNode; + + protected SimilarRexNodeElement(RexNode rexNode) { + this.rexNode = rexNode; + } + + public RexNode getRexNode() { + return rexNode; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (o == null || getClass() != o.getClass()) return false; + SimilarRexNodeElement that = (SimilarRexNodeElement) o; + return equalsWithSimilarType(rexNode, that.rexNode); + } + + private static boolean equalsWithSimilarType(RexNode rexNode1, RexNode rexNode2) { + if (!(rexNode1 instanceof RexLiteral) || !(rexNode2 instanceof RexLiteral)) { + return rexNode1.equals(rexNode2); + } + + RexLiteral rexLiteral1 = (RexLiteral) rexNode1; + RexLiteral rexLiteral2 = (RexLiteral) rexNode2; + + if (rexLiteral1.getValue() == null && rexLiteral2.getValue() == null) { + return true; + } + + return rexLiteral1.getValue() != null && rexLiteral1.getValue().compareTo(rexLiteral2.getValue()) == 0 && + rexLiteral1.getType().getSqlTypeName().equals(rexLiteral2.getType().getSqlTypeName()); + } + + @Override + public int hashCode() { + if (rexNode instanceof RexLiteral) { + RexLiteral rexLiteral = (RexLiteral) rexNode; + return Objects.hash(rexLiteral.getValue(), rexLiteral.getType().getSqlTypeName()); + } + return Objects.hash(rexNode); + } + } + + /** + * Check if the type of nodes in the two collections is homogeneous within the collections + * and identical between them. + * @param nodes1 the first collection of nodes + * @param nodes2 the second collection of nodes + * @return true if nodes in both collections is unique and identical, false otherwise + */ + private static boolean shareSameType( + Collection<SimilarRexNodeElement> nodes1, Collection<SimilarRexNodeElement> nodes2) { + return Stream.of(nodes1, nodes2).flatMap(Collection::stream) + .map(n -> n.getRexNode().getType().getSqlTypeName()) + .distinct() + .count() == 1; + } + + private static RexNode handleOR(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping all elements + final List<RexNode> operands = new ArrayList<>(RexUtil.flattenOr(call.getOperands())); + final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = LinkedHashMultimap.create(); + for (int i = 0; i < operands.size(); i++) { + RexNode operand = operands.get(i); + if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { + continue; + } + RexNode ref = inCall.getOperands().get(0); + for (int j = 1; j < inCall.getOperands().size(); j++) { + inLHSExprToRHSExprs.put(ref, new SimilarRexNodeElement(inCall.getOperands().get(j))); + } + operands.remove(i); + --i; + } + } + // Create IN clauses (fourth parameter is not needed since no expressions were removed) + final List<RexNode> newOperands = createInClauses(rexBuilder, + inLHSExprToRHSExprs.keySet(), inLHSExprToRHSExprs, null); + newOperands.addAll(operands); + // Return node + RexNode result = RexUtil.composeDisjunction(rexBuilder, newOperands, false); + if (!result.getType().equals(call.getType())) { + return rexBuilder.makeCast(call.getType(), result, true); + } + return result; + } + + private static RexNode createResultFromEmptySet(RexBuilder rexBuilder, + RexNode ref, Multimap<RexNode, RexNode> inLHSExprToRHSNullableExprs) { + if (inLHSExprToRHSNullableExprs.containsKey(ref)) { + // We handle possible null values in the expressions. + List<RexNode> nullableExprs = + inLHSExprToRHSNullableExprs.get(ref) + .stream() + .map(n -> rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ImmutableList.of(n))) + .collect(Collectors.toList()); + return RexUtil.composeConjunction(rexBuilder, + ImmutableList.of( + RexUtil.composeDisjunction(rexBuilder, nullableExprs, false), + rexBuilder.makeNullLiteral(rexBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN))), + false); + } + return rexBuilder.makeLiteral(false); + } + + private static List<RexNode> createInClauses(RexBuilder rexBuilder, Set<RexNode> visitedRefs, + Multimap<RexNode, SimilarRexNodeElement> inLHSExprToRHSExprs, Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs) { + final List<RexNode> newExpressions = new ArrayList<>(); + for (RexNode ref : visitedRefs) { + Collection<SimilarRexNodeElement> exprs = inLHSExprToRHSExprs.get(ref); + if (exprs.isEmpty()) { + // Note that Multimap does not keep a key if all its values are removed. + newExpressions.add(createResultFromEmptySet(rexBuilder, ref, inLHSExprToRHSNullableExprs)); + } else if (exprs.size() == 1) { + List<RexNode> newOperands = new ArrayList<>(2); + newOperands.add(ref); + newOperands.add(exprs.iterator().next().getRexNode()); + newExpressions.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, newOperands)); + } else { + List<RexNode> newOperands = new ArrayList<>(exprs.size() + 1); + newOperands.add(ref); + newOperands.addAll(exprs.stream().map(SimilarRexNodeElement::getRexNode).collect(Collectors.toList())); + newExpressions.add(rexBuilder.makeIn(newOperands.get(0), newOperands.subList(1, newOperands.size()))); + } + } + return newExpressions; + } + + } + + /** + * Transforms inequality candidates into [NOT] BETWEEN calls. + * + */ + public static class RexTransformIntoBetween extends RexShuttle { + private final RexBuilder rexBuilder; + + public RexTransformIntoBetween(RexBuilder rexBuilder) { + this.rexBuilder = rexBuilder; + } + + @Override + public RexNode visitCall(RexCall inputCall) { + RexNode node = super.visitCall(inputCall); + if (node instanceof RexCall) { + RexCall call = (RexCall) node; + switch (call.getKind()) { + case AND: + return processComparisons(call, SqlKind.LESS_THAN_OR_EQUAL, false); + case OR: + return processComparisons(call, SqlKind.GREATER_THAN, true); + default: + break; + } + } + return node; + } + + /** + * Represents a replacement candidate. + */ + static class BetweenCandidate { + + private final RexNode newNode; + private final RexNode[] oldNodes; + // keeps track if this candidate was already used during replacement + private boolean used; + + public BetweenCandidate(RexNode newNode, RexNode... oldNodes) { + this.newNode = newNode; + this.oldNodes = oldNodes; + } + } + + private RexNode processComparisons(RexCall call, SqlKind forwardEdge, boolean invert) { + DiGraph<RexNodeRef, RexCall> g = + buildComparisonGraph(call.getOperands(), forwardEdge); + Map<RexNode, BetweenCandidate> replacedNodes = new IdentityHashMap<>(); + for (RexNodeRef n : g.nodes()) { + Set<RexNodeRef> pred = g.predecessors(n); + Set<RexNodeRef> succ = g.successors(n); + if (!pred.isEmpty() && !succ.isEmpty()) { + RexNodeRef p = pred.iterator().next(); + RexNodeRef s = succ.iterator().next(); + + RexNode between = rexBuilder.makeCall(HiveBetween.INSTANCE, + rexBuilder.makeLiteral(invert), n.getRexNode(), p.getRexNode(), s.getRexNode()); + BetweenCandidate bc = new BetweenCandidate( + between, + g.removeEdge(p, n), + g.removeEdge(n, s)); + + for (RexNode node : bc.oldNodes) { + replacedNodes.put(node, bc); + } + } + } + if (replacedNodes.isEmpty()) { + // no effect + return call; + } + List<RexNode> newOperands = new ArrayList<>(); + for (RexNode o : call.getOperands()) { + BetweenCandidate candidate = replacedNodes.get(o); + if (candidate == null) { + newOperands.add(o); + } else { + if (!candidate.used) { + newOperands.add(candidate.newNode); + candidate.used = true; + } + } + } + + if (newOperands.size() == 1) { + return newOperands.get(0); + } else { + return rexBuilder.makeCall(call.getOperator(), newOperands); + } + } + + /** + * Builds a graph of the given comparison type. + * The graph edges are annotated with the RexNodes representing the comparison. + */ + private DiGraph<RexNodeRef, RexCall> buildComparisonGraph(List<RexNode> operands, SqlKind cmpForward) { + DiGraph<RexNodeRef, RexCall> g = new DiGraph<>(); + for (RexNode node : operands) { + if(!(node instanceof RexCall) ) { + continue; + } + RexCall rexCall = (RexCall) node; + SqlKind kind = rexCall.getKind(); + if (kind == cmpForward) { + RexNode opA = rexCall.getOperands().get(0); + RexNode opB = rexCall.getOperands().get(1); + g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall); + } else if (kind == cmpForward.reverse()) { + RexNode opA = rexCall.getOperands().get(1); + RexNode opB = rexCall.getOperands().get(0); + g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall); + } + } + return g; + } + } + + /** + * Transforms OR clauses into IN clauses, when possible. + */ + public static class RexTransformIntoInClause extends RexShuttle { + private final RexBuilder rexBuilder; + private final int minNumORClauses; + + public RexTransformIntoInClause(RexBuilder rexBuilder, int minNumORClauses) { + this.rexBuilder = rexBuilder; + this.minNumORClauses = minNumORClauses; + } + + @Override + public RexNode visitCall(RexCall inputCall) { + RexNode node = super.visitCall(inputCall); + if (node instanceof RexCall) { + RexCall call = (RexCall) node; + if (call.getKind() == SqlKind.OR) { + try { + RexNode newNode = transformIntoInClauseCondition(rexBuilder, + call, minNumORClauses); + if (newNode != null) { + return newNode; + } + } catch (SemanticException e) { + LOG.error("Exception in HivePointLookupOptimizerRule", e); + return call; + } + } + } + return node; + } + + /** + * A group of Constraints. + * + * Examples: + * (a=1 && b=1) + * (a=1) + * + * Note: any rexNode is accepted as constraint; but it might be keyed with the empty key; + * which means it can't be parsed as a constraint for some reason; but for completeness... + * + */ + static class ConstraintGroup { + private final Map<RexNodeRef, Constraint> constraints = new HashMap<>(); + private final RexNode originalRexNode; + private final Set<RexNodeRef> key; + + public ConstraintGroup(RexNode rexNode) { + originalRexNode = rexNode; + + final List<RexNode> conjunctions = RelOptUtil.conjunctions(rexNode); + + for (RexNode n : conjunctions) { + + Constraint c = Constraint.of(n); + if (c == null) { + // interpretation failed; make this node opaque + key = Collections.emptySet(); + return; + } + constraints.put(c.getKey(), c); + } + if (constraints.size() != conjunctions.size()) { + LOG.debug("unexpected situation; giving up on this branch"); + key = Collections.emptySet(); + return; + } + key = constraints.keySet(); + } + + public List<RexNode> getValuesInOrder(List<RexNodeRef> columns) throws SemanticException { + List<RexNode> ret = new ArrayList<>(); + for (RexNodeRef rexInputRef : columns) { + Constraint constraint = constraints.get(rexInputRef); + if (constraint == null) { + throw new SemanticException("Unable to find constraint which was earlier added."); + } + ret.add(constraint.getValue()); + } + return ret; + } + } + + private RexNode transformIntoInClauseCondition(RexBuilder rexBuilder, RexNode condition, + int minNumORClauses) throws SemanticException { + assert condition.getKind() == SqlKind.OR; + + ImmutableList<RexNode> operands = RexUtil.flattenOr(((RexCall) condition).getOperands()); + if (operands.size() < minNumORClauses) { + // We bail out + return null; + } + List<ConstraintGroup> allNodes = new ArrayList<>(); + List<ConstraintGroup> processedNodes = new ArrayList<>(); + for (int i = 0; i < operands.size(); i++) { + ConstraintGroup m = new ConstraintGroup(operands.get(i)); + allNodes.add(m); + } + + Multimap<Set<RexNodeRef>, ConstraintGroup> assignmentGroups = + Multimaps.index(allNodes, cg -> cg.key); + + for (Map.Entry<Set<RexNodeRef>, Collection<ConstraintGroup>> sa : assignmentGroups.asMap().entrySet()) { + // skip opaque + if (sa.getKey().isEmpty()) { + continue; + } + // not enough equalities should not be handled + if (sa.getValue().size() < 2 || sa.getValue().size() < minNumORClauses) { + continue; + } + + allNodes.add(new ConstraintGroup(buildInFor(sa.getKey(), sa.getValue()))); + processedNodes.addAll(sa.getValue()); + } + + if (processedNodes.isEmpty()) { + return null; + } + allNodes.removeAll(processedNodes); + List<RexNode> ops = new ArrayList<>(); + for (ConstraintGroup mx : allNodes) { + ops.add(mx.originalRexNode); + } + if (ops.size() == 1) { + return ops.get(0); + } else { + return rexBuilder.makeCall(SqlStdOperatorTable.OR, ops); + } + + } + + private RexNode buildInFor(Set<RexNodeRef> set, Collection<ConstraintGroup> value) throws SemanticException { + + List<RexNodeRef> columns = new ArrayList<>(set); + columns.sort(RexNodeRef.COMPARATOR); + List<RexNode >operands = new ArrayList<>(); + + List<RexNode> columnNodes = columns.stream().map(RexNodeRef::getRexNode).collect(Collectors.toList()); + operands.add(useStructIfNeeded(columnNodes)); + for (ConstraintGroup node : value) { + List<RexNode> values = node.getValuesInOrder(columns); + operands.add(useStructIfNeeded(values)); + } + + return rexBuilder.makeIn(operands.get(0), operands.subList(1, operands.size())); + } + + private RexNode useStructIfNeeded(List<? extends RexNode> columns) { + // Create STRUCT clause + if (columns.size() == 1) { + return columns.get(0); + } else { + return rexBuilder.makeCall(SqlStdOperatorTable.ROW, columns); + } + } + + } + + public static RexNode transformOrToInAndInequalityToBetween( + RexBuilder rexBuilder, RexNode condition, int minNumORClauses) { + // 1. We try to transform possible candidates + RexTransformIntoInClause transformIntoInClause = new RexTransformIntoInClause(rexBuilder, minNumORClauses); + RexNode newCondition = transformIntoInClause.apply(condition); + + // 2. We merge IN expressions + RexMergeInClause mergeInClause = new RexMergeInClause(rexBuilder); + newCondition = mergeInClause.apply(newCondition); + + // 3. Close BETWEEN expressions if possible + RexTransformIntoBetween t = new RexTransformIntoBetween(rexBuilder); + newCondition = t.apply(newCondition); + return newCondition; + } Review Comment: Logged https://issues.apache.org/jira/browse/HIVE-28907 for reworking the rule after the intro of the SEARCH operator. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java: ########## @@ -138,6 +141,17 @@ public boolean isSynthetic() { @Override public RelWriter explainTerms(RelWriter pw) { + // Remove this if block after upgrading Calcite to 1.35+ Review Comment: Not relevant with latest changes. Resolving. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java: ########## @@ -128,6 +129,11 @@ public HiveJdbcImplementor(SqlDialect dialect, JavaTypeFactory typeFactory) { return result(join, leftResult, rightResult); } +public Result visit(JdbcTableScan scan) { + return result(scan.jdbcTable.tableName(), + ImmutableList.of(Clause.FROM), scan, null); +} + Review Comment: Since we are just restoring the breaking change introduced by CALCITE-4640 introducing this method here makes sense. ########## ql/src/test/results/clientpositive/perf/tpcds30tb/tez/cbo_query13.q.out: ########## @@ -5,19 +5,19 @@ HiveProject(_o__c0=[/(CAST($0):DOUBLE, $1)], _o__c1=[CAST(/($2, $3)):DECIMAL(11, HiveJoin(condition=[=($1, $18)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[AND(=($2, $14), OR(AND($15, $7), AND($16, $8), AND($17, $9)))], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($6, $13)], joinType=[inner], algorithm=[none], cost=[not available]) - HiveProject(ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], ss_quantity=[$9], ss_ext_sales_price=[$14], ss_ext_wholesale_cost=[$15], ss_sold_date_sk=[$22], BETWEEN=[BETWEEN(false, $21, 100:DECIMAL(12, 2), 200:DECIMAL(12, 2))], BETWEEN9=[BETWEEN(false, $21, 150:DECIMAL(12, 2), 300:DECIMAL(12, 2))], BETWEEN10=[BETWEEN(false, $21, 50:DECIMAL(12, 2), 250:DECIMAL(12, 2))], BETWEEN11=[BETWEEN(false, $12, 100:DECIMAL(3, 0), 150:DECIMAL(3, 0))], BETWEEN12=[BETWEEN(false, $12, 50:DECIMAL(2, 0), 100:DECIMAL(3, 0))], BETWEEN13=[BETWEEN(false, $12, 150:DECIMAL(3, 0), 200:DECIMAL(3, 0))]) - HiveFilter(condition=[AND(IS NOT NULL($3), IS NOT NULL($5), IS NOT NULL($4), IS NOT NULL($6), OR(<=(100:DECIMAL(3, 0), $12), <=($12, 150:DECIMAL(3, 0)), <=(50:DECIMAL(2, 0), $12), <=($12, 100:DECIMAL(3, 0)), <=(150:DECIMAL(3, 0), $12), <=($12, 200:DECIMAL(3, 0))), OR(<=(100:DECIMAL(12, 2), $21), <=($21, 200:DECIMAL(12, 2)), <=(150:DECIMAL(12, 2), $21), <=($21, 300:DECIMAL(12, 2)), <=(50:DECIMAL(12, 2), $21), <=($21, 250:DECIMAL(12, 2))), IS NOT NULL($22))]) + HiveProject(ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], ss_quantity=[$9], ss_ext_sales_price=[$14], ss_ext_wholesale_cost=[$15], ss_sold_date_sk=[$22], EXPR$0=[BETWEEN(false, $21, 100:DECIMAL(12, 2), 200:DECIMAL(12, 2))], EXPR$1=[BETWEEN(false, $21, 150:DECIMAL(12, 2), 300:DECIMAL(12, 2))], EXPR$2=[BETWEEN(false, $21, 50:DECIMAL(12, 2), 250:DECIMAL(12, 2))], EXPR$5=[BETWEEN(false, $12, 100:DECIMAL(3, 0), 150:DECIMAL(3, 0))], EXPR$8=[BETWEEN(false, $12, 50:DECIMAL(3, 0), 100:DECIMAL(3, 0))], EXPR$11=[BETWEEN(false, $12, 150:DECIMAL(3, 0), 200:DECIMAL(3, 0))]) + HiveFilter(condition=[AND(IS NOT NULL($21), IS NOT NULL($12), IS NOT NULL($3), IS NOT NULL($5), IS NOT NULL($4), IS NOT NULL($6), IS NOT NULL($22))]) Review Comment: The disjnuction is simplified to IS NOT NULL as discussed under: https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.n5m7m06tr31a ########## iceberg/iceberg-handler/src/test/results/positive/update_iceberg_copy_on_write_partitioned.q.out: ########## @@ -71,10 +71,10 @@ STAGE PLANS: Map Operator Tree: TableScan alias: tbl_ice - filterExpr: (((b <> 'one') and (b <> 'four') and (a <> 22)) or ((a = 22) or (b) IN ('one', 'four')) is null) (type: boolean) + filterExpr: (((b) IN ('four', 'one') or (a = 22)) is null or (((b < 'four') or ((b > 'four') and (b < 'one')) or (b > 'one')) and (a <> 22))) (type: boolean) Review Comment: This is a direct consequence of the fact that `<>` operator is simplified to `SEARCH`. When we expand SEARCH we end up with this inequality of ranges. The new expression is slightly less efficient than the previous one but probably we can leave with it. I logged https://issues.apache.org/jira/browse/HIVE-28911 as a follow-up improvement. It is also discussed here: https://docs.google.com/document/d/167dOnX0LO-zI4Pjt-wfs1ILqjgCq2QJxtRA-RiJcdv0/edit?tab=t.0#heading=h.nfxo4n2hxorj ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java: ########## @@ -683,6 +687,20 @@ public static ImmutableList<RexNode> getPredsNotPushedAlready(Set<String> predic predicatesToExclude.addAll(predicatesInSubtree); return newConjuncts.build(); } + + private static RexNode simplify(RexSimplify simplifier, RexNode node) { + RexNode result = node; + int maxTries = 5; + for (int i = 0; i < maxTries; i++) { + RexNode simplified = simplifier.simplify(result); + if (simplified.equals(result)) { + break; + } + result = simplified; + } + + return result; Review Comment: Fixed by https://github.com/apache/hive/pull/5196/commits/f24c0b9cc86248c231795db9f41c61c8aab5a139. Marking this as resolved. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java: ########## @@ -77,6 +77,7 @@ import org.apache.calcite.util.mapping.Mappings; import org.apache.hadoop.hive.ql.exec.FunctionRegistry; import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject; +import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule; Review Comment: Changes were reverted in latest commits so discussion not relevant anymore. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -869,6 +875,12 @@ public ASTNode visitInputRef(RexInputRef inputRef) { @Override public ASTNode visitLiteral(RexLiteral literal) { + if (!RexUtil.isNull(literal) && literal.getType().isStruct()) { + return rexBuilder + .makeCall(SqlStdOperatorTable.ROW, (List<? extends RexNode>) literal.getValue()) + .accept(this); + } + Review Comment: With the latest changes ASTConverter is left completely unchanged so this discussion is not relevant anymore. SEARCH is expanded before hitting the ASTConverter. ########## ql/src/test/results/clientpositive/llap/pcs.q.out: ########## @@ -1286,16 +1293,18 @@ PREHOOK: type: QUERY PREHOOK: Input: default@pcs_t1 PREHOOK: Input: default@pcs_t1@ds=2000-04-08 PREHOOK: Input: default@pcs_t1@ds=2000-04-09 +PREHOOK: Input: default@pcs_t1@ds=2000-04-10 #### A masked pattern was here #### POSTHOOK: query: explain extended select ds from pcs_t1 where struct(ds, key, rand(100)) in (struct('2000-04-08',1,0.2), struct('2000-04-09',2,0.3)) POSTHOOK: type: QUERY POSTHOOK: Input: default@pcs_t1 POSTHOOK: Input: default@pcs_t1@ds=2000-04-08 POSTHOOK: Input: default@pcs_t1@ds=2000-04-09 +POSTHOOK: Input: default@pcs_t1@ds=2000-04-10 #### A masked pattern was here #### OPTIMIZED SQL: SELECT `ds` FROM `default`.`pcs_t1` -WHERE (`ds`, `key`, RAND(100)) IN ( ('2000-04-08', 1, 0.2), ('2000-04-09', 2, 0.3)) +WHERE (`ds`, `key`, RAND(100)) = ('2000-04-08', 1, 0.2) OR (`ds`, `key`, RAND(100)) = ('2000-04-09', 2, 0.3) Review Comment: I restored the test to its original state since latest commits fixed the problem with partition pruning. Marking this as resolved. ########## ql/src/test/results/clientpositive/llap/pcs.q.out: ########## @@ -1355,6 +1364,40 @@ STAGE PLANS: serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + properties: + bucketing_version 2 + column.name.delimiter , + columns key,value + columns.comments + columns.types int:string +#### A masked pattern was here #### + name default.pcs_t1 + partition_columns ds + partition_columns.types string + serialization.format 1 + serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + name: default.pcs_t1 + name: default.pcs_t1 + Partition + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + partition values: + ds 2000-04-10 + properties: + column.name.delimiter , + columns key,value + columns.types int:string +#### A masked pattern was here #### + name default.pcs_t1 + partition_columns ds + partition_columns.types string + serialization.format 1 + serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + Review Comment: Latest commits fixed the problem with partition pruning so marking this as resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org