Re: [PR] HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates [hive]

via GitHub Mon, 18 May 2026 16:01:17 -0700


soumyakanti3578 commented on code in PR #6477:
URL: https://github.com/apache/hive/pull/6477#discussion_r3262678454



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -603,6 +605,151 @@ private Optional<Float> extractLiteral(SqlTypeName 
typeName, Object boundValueOb
     return Optional.of(value);
   }
 
+  private double computeSearchSelectivity(RexCall search) {
+    return new SearchSelectivityHelper<>(search).compute();
+  }
+
+  /**
+   * Similar to {@link SearchTransformer}, but computing the selectivity of 
the expression.
+   */
+  private final class SearchSelectivityHelper<C extends Comparable<C>> {
+    private final RexNode ref;
+    private final Sarg<C> sarg;
+    private final RelDataType operandType;
+
+    private SearchSelectivityHelper(RexCall search) {
+      ref = search.getOperands().get(0);
+      RexLiteral literal = (RexLiteral) search.operands.get(1);
+      sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg");
+      operandType = literal.getType();
+    }
+
+    private RexNode makeLiteral(C value) {
+      return rexBuilder.makeLiteral(value, operandType, true, true);
+    }
+
+    private double compute() {
+      final List<Double> selectivityList = new ArrayList<>();
+      final List<RexNode> inLiterals = new ArrayList<>();
+
+      if (sarg.nullAs == RexUnknownAs.TRUE) {
+        selectivityList.add(
+            rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, 
ref).accept(FilterSelectivityEstimator.this));
+      }
+
+      RangeSets.forEach(sarg.rangeSet, new RangeSets.Consumer<C>() {

Review Comment:
   While I agree that the refactored code is much smaller/simplified now, I 
feel the previous version was more organized and readable as it's now a huge 
method with multiple `if else` blocks. 
   
   Moreover, I see that implementing `RangeSets.Consumer<C>` is the preferred 
method both in Hive 
(`org.apache.hadoop.hive.ql.optimizer.calcite.RangeConverter`) and in Calcite 
(several places). If the new code is not significantly more performant than the 
earlier version, then maybe we should keep things familiar?
   
   Another small benefit of implementing `RangeSets.Consumer<C>` is it will be 
easily searchable from IDE by looking for all subclasses.
   
   BTW, I am willing to approve this as-is, but just wanted to hear both of 
your thoughts on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates [hive]

Reply via email to