rubenada commented on code in PR #6503:
URL: https://github.com/apache/hive/pull/6503#discussion_r3363693382
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -628,14 +628,23 @@ private RexNode makeLiteral(C value) {
private double compute() {
final List<RexNode> inLiterals = new ArrayList<>();
final List<Double> rangeSelectivities = new ArrayList<>();
- for (Range<C> range : sarg.rangeSet.asRanges()) {
- if (!range.hasLowerBound() && !range.hasUpperBound()) {
- return 1.0; // "all" range
+ final List<Double> searchSelectivities = new ArrayList<>();
+
+ if (sarg.isComplementedPoints()) {
+ // Generate 'ref <> value1 AND ... AND ref <> valueN'
+ List<RexNode> notEq = sarg.rangeSet.complement().asRanges().stream()
+ .map(range -> rexBuilder.makeCall(SqlStdOperatorTable.NOT_EQUALS,
ref, makeLiteral(range.lowerEndpoint())))
+ .toList();
+ searchSelectivities.add(RexUtil.composeConjunction(rexBuilder,
notEq).accept(FilterSelectivityEstimator.this));
+ } else {
Review Comment:
Uhmmm... you're right, the original version (which uses histograms) would
have a more accurate estimation than the proposed one with NOT_EQUALS (which is
estimated simply with ndv-1/ndv , which can be quite off). However, it might be
possible that histograms are not available in general (so the original version
would default to a hadcoded selectivity), whereas the sub-optimal optimization
with NOT_EQUALS uses a more generally available ndv estimated value (and this
estimation, although not perfect, would be better than the hardcoded value of
the original version).
Having said that, I guess we should try to aim for the better solution, and
trust that statistics would be available, so I lean towards reverting the
change in this file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]