zabetak commented on code in PR #6293:
URL: https://github.com/apache/hive/pull/6293#discussion_r2826328984


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -184,91 +188,284 @@ public Double visitCall(RexCall call) {
     return selectivity;
   }
 
+  /**
+   * If the cast can be removed, just return its operand and adjust the 
boundaries if necessary.
+   *
+   * <p>
+   *   In Hive, if a value cannot be represented by the cast, the result of 
the cast is NULL,
+   *   and therefore cannot fulfill the predicate. So the possible range of 
the values
+   *   is limited by the range of possible values of the type.
+   * </p>
+   *
+   * <p>
+   *   Special care is taken to support the cast to DECIMAL(precision, scale):
+   *   The cast to DECIMAL rounds the value the same way as {@link 
RoundingMode#HALF_UP}.
+   *   The boundaries are adjusted accordingly, without changing the semantics 
of <code>inclusive</code>.
+   * </p>
+   *
+   * @param cast a RexCall of type {@link SqlKind#CAST}
+   * @param tableScan the table that provides the statistics
+   * @param boundaries indexes 0 and 1 are the boundaries of the range 
predicate;
+   *                   indexes 2 and 3, if they exist, will be set to the 
boundaries of the type range
+   * @param inclusive whether the respective boundary is inclusive or 
exclusive.
+   * @return the operand if the cast can be removed, otherwise the cast itself
+   */
+  private RexNode removeCastIfPossible(RexCall cast, HiveTableScan tableScan, 
float[] boundaries, boolean[] inclusive) {

Review Comment:
   Here we have a trade-off between precision and complexity. More complex code 
gives us better precision but its longer to write, test, review, and maintain. 
Personally, I would be OK to use `RexUtil#isLosslessCast` and sacrifice some 
precision opting for simpler code especially since we don't have any real data 
points about the importance of handling narrow casts and decimals. I assume 
that from your point of view precision is more important and thus you opted for 
the more complex solution. Since you are the one driving this I will let you 
decide how you prefer that we move forward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to