rubenada commented on code in PR #6503:
URL: https://github.com/apache/hive/pull/6503#discussion_r3363693382


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -628,14 +628,23 @@ private RexNode makeLiteral(C value) {
     private double compute() {
       final List<RexNode> inLiterals = new ArrayList<>();
       final List<Double> rangeSelectivities = new ArrayList<>();
-      for (Range<C> range : sarg.rangeSet.asRanges()) {
-        if (!range.hasLowerBound() && !range.hasUpperBound()) {
-          return 1.0; // "all" range
+      final List<Double> searchSelectivities = new ArrayList<>();
+
+      if (sarg.isComplementedPoints()) {
+        // Generate 'ref <> value1 AND ... AND ref <> valueN'
+        List<RexNode> notEq = sarg.rangeSet.complement().asRanges().stream()
+            .map(range -> rexBuilder.makeCall(SqlStdOperatorTable.NOT_EQUALS, 
ref, makeLiteral(range.lowerEndpoint())))
+            .toList();
+        searchSelectivities.add(RexUtil.composeConjunction(rexBuilder, 
notEq).accept(FilterSelectivityEstimator.this));
+      } else {

Review Comment:
   Uhmmm... you're right, the original version (which uses histograms) would 
have a more accurate estimation than the proposed one with NOT_EQUALS (which is 
estimated simply with ndv-1/ndv , which can be quite off). However, it might be 
possible that histograms are not available in general (so the original version 
would default to a hadcoded selectivity), whereas the sub-optimal optimization 
with NOT_EQUALS uses a more generally available ndv estimated value (and this 
estimation, although not perfect, would be better than the hardcoded value of 
the original version).
   
   Having said that, I guess we should try to aim for the better solution, and 
trust that statistics would be available, so I lean towards reverting the 
change in this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to