zabetak commented on code in PR #6503:
URL: https://github.com/apache/hive/pull/6503#discussion_r3356327939


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -628,14 +628,23 @@ private RexNode makeLiteral(C value) {
     private double compute() {
       final List<RexNode> inLiterals = new ArrayList<>();
       final List<Double> rangeSelectivities = new ArrayList<>();
-      for (Range<C> range : sarg.rangeSet.asRanges()) {
-        if (!range.hasLowerBound() && !range.hasUpperBound()) {
-          return 1.0; // "all" range
+      final List<Double> searchSelectivities = new ArrayList<>();
+
+      if (sarg.isComplementedPoints()) {
+        // Generate 'ref <> value1 AND ... AND ref <> valueN'
+        List<RexNode> notEq = sarg.rangeSet.complement().asRanges().stream()
+            .map(range -> rexBuilder.makeCall(SqlStdOperatorTable.NOT_EQUALS, 
ref, makeLiteral(range.lowerEndpoint())))
+            .toList();
+        searchSelectivities.add(RexUtil.composeConjunction(rexBuilder, 
notEq).accept(FilterSelectivityEstimator.this));
+      } else {

Review Comment:
   Does this form lead to better selectivity estimates than what we had before? 
If not then we don't necessarily need to change it. The purpose of this code is 
to compute a good selectivity for the SEARCH predicate not to transform it to 
something else.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/SearchTransformer.java:
##########
@@ -72,30 +72,44 @@ public SearchTransformer(RexBuilder rexBuilder, RexCall 
search, final RexUnknown
     this.unknownContext = unknownContext;
   }
 
+  /**
+   * Transforms the SEARCH expression into an equivalent RexNode expression.
+   * Warning: when called from a shuttle, callers of this method should 
consider flattening AND/OR expressions
+   * afterward, to get the same result as applying {@link 
SearchTransformer.Shuttle}.
+   */
   public RexNode transform() {
     PerfLogger perfLogger = SessionState.getPerfLogger();
     perfLogger.perfLogBegin(this.getClass().getName(), 
PerfLogger.SEARCH_TRANSFORMER);
 
-    RangeConverter<C> consumer = new RangeConverter<>(rexBuilder, operandType, 
ref);
-    RangeSets.forEach(sarg.rangeSet, consumer);
-
     List<RexNode> orList = new ArrayList<>();
     if (sarg.nullAs == RexUnknownAs.TRUE && unknownContext != 
RexUnknownAs.TRUE) {
       orList.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ref));
     }
-    switch (consumer.inLiterals.size()) {
-    case 0:
-      break;
-    case 1:
-      orList.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, ref, 
consumer.inLiterals.get(0)));
-      break;
-    default:
-      List<RexNode> operands = new ArrayList<>(consumer.inLiterals.size() + 1);
-      operands.add(ref);
-      operands.addAll(consumer.inLiterals);
-      orList.add(rexBuilder.makeCall(HiveIn.INSTANCE, operands));
+
+    if (sarg.isComplementedPoints()) {

Review Comment:
   When the sarg is not strictly a complement then I guess we cannot handle it. 
Examples:
   ```sql
   ... WHERE NOT ( x = 10 OR x = 20 OR (x > 30 AND x < 50))
   ... WHERE (x <> 10 AND x <> 20 AND (x <= 30 OR x >= 50))
   ```
   One way to cover those would be to apply the existing `RangeConverter` on 
`sarg.rangeSet.complement()` and invert the EQUALS and HiveIn operators in the 
switch. The challenge here is when to use the `sarg.rangeSet` and when its 
complement. A naive choice could be to base the decision on 
`sarg.rangeSet.asRanges().size()` or something along these lines. 
   
   Anyways, the change here is fine as it is so we can log a another ticket and 
follow-up there if its worth to do it or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to