thomasrebele commented on code in PR #6477:
URL: https://github.com/apache/hive/pull/6477#discussion_r3274028088
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -603,6 +605,151 @@ private Optional<Float> extractLiteral(SqlTypeName
typeName, Object boundValueOb
return Optional.of(value);
}
+ private double computeSearchSelectivity(RexCall search) {
+ return new SearchSelectivityHelper<>(search).compute();
+ }
+
+ /**
+ * Similar to {@link SearchTransformer}, but computing the selectivity of
the expression.
+ */
+ private final class SearchSelectivityHelper<C extends Comparable<C>> {
+ private final RexNode ref;
+ private final Sarg<C> sarg;
+ private final RelDataType operandType;
+
+ private SearchSelectivityHelper(RexCall search) {
+ ref = search.getOperands().get(0);
+ RexLiteral literal = (RexLiteral) search.operands.get(1);
+ sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg");
+ operandType = literal.getType();
+ }
+
+ private RexNode makeLiteral(C value) {
+ return rexBuilder.makeLiteral(value, operandType, true, true);
+ }
+
+ private double compute() {
+ final List<Double> selectivityList = new ArrayList<>();
+ final List<RexNode> inLiterals = new ArrayList<>();
+
+ if (sarg.nullAs == RexUnknownAs.TRUE) {
+ selectivityList.add(
+ rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
ref).accept(FilterSelectivityEstimator.this));
+ }
+
+ RangeSets.forEach(sarg.rangeSet, new RangeSets.Consumer<C>() {
Review Comment:
There are a few places in Calcite that iterate over
`sarg.rangeSet.asRanges()` without the Conumer:
*
[RexUtil#sargRef](https://github.com/apache/calcite/blob/a8345ae8ea8ba951d2663db0cf9637f5884db37b/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L638-L653)
*
[DruidDateTimeUtils#leafToRanges](https://github.com/apache/calcite/blob/a8345ae8ea8ba951d2663db0cf9637f5884db37b/druid/src/main/java/org/apache/calcite/adapter/druid/DruidDateTimeUtils.java#L246)
The places where a `RangeSets.Consumer<C>` is used in Calcite, there is an
easy mapping from the different range types to a distinct action. Hive always
uses the sarg.rangeSet with a `RangeSets.Consumer`. However, I could only find
one usage, and it was introduced by @soumyakanti3578, so I'm not sure whether
the opinion is unbiased :) The usages (including those from the Consumer) can
be found in the IDE by looking at the usages of
`org.apache.calcite.util.Sarg#rangeSet`.
I had a try simplifying the code a bit, see
https://github.com/thomasrebele/hive/commit/29cb98b2b0aba10a7b29749aef3e770dae667433.
It's a bit less efficient than Ruben's proposal. It might be a bit more
readable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]