thomasrebele commented on code in PR #6477:
URL: https://github.com/apache/hive/pull/6477#discussion_r3274028088


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -603,6 +605,151 @@ private Optional<Float> extractLiteral(SqlTypeName 
typeName, Object boundValueOb
     return Optional.of(value);
   }
 
+  private double computeSearchSelectivity(RexCall search) {
+    return new SearchSelectivityHelper<>(search).compute();
+  }
+
+  /**
+   * Similar to {@link SearchTransformer}, but computing the selectivity of 
the expression.
+   */
+  private final class SearchSelectivityHelper<C extends Comparable<C>> {
+    private final RexNode ref;
+    private final Sarg<C> sarg;
+    private final RelDataType operandType;
+
+    private SearchSelectivityHelper(RexCall search) {
+      ref = search.getOperands().get(0);
+      RexLiteral literal = (RexLiteral) search.operands.get(1);
+      sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg");
+      operandType = literal.getType();
+    }
+
+    private RexNode makeLiteral(C value) {
+      return rexBuilder.makeLiteral(value, operandType, true, true);
+    }
+
+    private double compute() {
+      final List<Double> selectivityList = new ArrayList<>();
+      final List<RexNode> inLiterals = new ArrayList<>();
+
+      if (sarg.nullAs == RexUnknownAs.TRUE) {
+        selectivityList.add(
+            rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, 
ref).accept(FilterSelectivityEstimator.this));
+      }
+
+      RangeSets.forEach(sarg.rangeSet, new RangeSets.Consumer<C>() {

Review Comment:
   There are a few places in Calcite that iterate over 
`sarg.rangeSet.asRanges()` without the Conumer:
   * 
[RexUtil#sargRef](https://github.com/apache/calcite/blob/a8345ae8ea8ba951d2663db0cf9637f5884db37b/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L638-L653)
   * 
[DruidDateTimeUtils#leafToRanges](https://github.com/apache/calcite/blob/a8345ae8ea8ba951d2663db0cf9637f5884db37b/druid/src/main/java/org/apache/calcite/adapter/druid/DruidDateTimeUtils.java#L246)
   
   The places where a `RangeSets.Consumer<C>` is used in Calcite, there is an 
easy mapping from the different range types to a distinct action. Hive always 
uses the sarg.rangeSet with a `RangeSets.Consumer`. However, I could only find 
one usage, and it was introduced by @soumyakanti3578, so I'm not sure whether 
the opinion is unbiased :) The usages (including those from the Consumer) can 
be found in the IDE by looking at the usages of 
`org.apache.calcite.util.Sarg#rangeSet`.
   
   I had a try simplifying the code a bit, see 
https://github.com/thomasrebele/hive/commit/29cb98b2b0aba10a7b29749aef3e770dae667433.
 It's a bit less efficient than Ruben's proposal. It might be a bit more 
readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to