zabetak commented on code in PR #6293:
URL: https://github.com/apache/hive/pull/6293#discussion_r2826328984
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -184,91 +188,284 @@ public Double visitCall(RexCall call) {
return selectivity;
}
+ /**
+ * If the cast can be removed, just return its operand and adjust the
boundaries if necessary.
+ *
+ * <p>
+ * In Hive, if a value cannot be represented by the cast, the result of
the cast is NULL,
+ * and therefore cannot fulfill the predicate. So the possible range of
the values
+ * is limited by the range of possible values of the type.
+ * </p>
+ *
+ * <p>
+ * Special care is taken to support the cast to DECIMAL(precision, scale):
+ * The cast to DECIMAL rounds the value the same way as {@link
RoundingMode#HALF_UP}.
+ * The boundaries are adjusted accordingly, without changing the semantics
of <code>inclusive</code>.
+ * </p>
+ *
+ * @param cast a RexCall of type {@link SqlKind#CAST}
+ * @param tableScan the table that provides the statistics
+ * @param boundaries indexes 0 and 1 are the boundaries of the range
predicate;
+ * indexes 2 and 3, if they exist, will be set to the
boundaries of the type range
+ * @param inclusive whether the respective boundary is inclusive or
exclusive.
+ * @return the operand if the cast can be removed, otherwise the cast itself
+ */
+ private RexNode removeCastIfPossible(RexCall cast, HiveTableScan tableScan,
float[] boundaries, boolean[] inclusive) {
Review Comment:
Here we have a trade-off between precision and complexity. More complex code
gives us better precision but its longer to write, test, review, and maintain.
Personally, I would be OK to use `RexUtil#isLosslessCast` and sacrifice some
precision opting for simpler code especially since we don't have any real data
points about the importance of handling narrow casts and decimals. I assume
that from your point of view precision is more important and thus you opted for
the more complex solution. Since you are the one driving this I will let you
decide how you prefer that we move forward.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]