thomasrebele commented on code in PR #6293:
URL: https://github.com/apache/hive/pull/6293#discussion_r2832481804


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -184,91 +188,284 @@ public Double visitCall(RexCall call) {
     return selectivity;
   }
 
+  /**
+   * If the cast can be removed, just return its operand and adjust the 
boundaries if necessary.
+   *
+   * <p>
+   *   In Hive, if a value cannot be represented by the cast, the result of 
the cast is NULL,
+   *   and therefore cannot fulfill the predicate. So the possible range of 
the values
+   *   is limited by the range of possible values of the type.
+   * </p>
+   *
+   * <p>
+   *   Special care is taken to support the cast to DECIMAL(precision, scale):
+   *   The cast to DECIMAL rounds the value the same way as {@link 
RoundingMode#HALF_UP}.
+   *   The boundaries are adjusted accordingly, without changing the semantics 
of <code>inclusive</code>.
+   * </p>
+   *
+   * @param cast a RexCall of type {@link SqlKind#CAST}
+   * @param tableScan the table that provides the statistics
+   * @param boundaries indexes 0 and 1 are the boundaries of the range 
predicate;
+   *                   indexes 2 and 3, if they exist, will be set to the 
boundaries of the type range
+   * @param inclusive whether the respective boundary is inclusive or 
exclusive.
+   * @return the operand if the cast can be removed, otherwise the cast itself
+   */
+  private RexNode removeCastIfPossible(RexCall cast, HiveTableScan tableScan, 
float[] boundaries, boolean[] inclusive) {
+    RexNode op0 = cast.getOperands().getFirst();
+    if (!(op0 instanceof RexInputRef)) {
+      return cast;
+    }
+    int index = ((RexInputRef) op0).getIndex();
+    final List<ColStatistics> colStats = 
tableScan.getColStat(Collections.singletonList(index));
+    if (colStats.isEmpty()) {
+      return cast;
+    }
+
+    // we need to check that the possible values of the input to the cast are 
all within the type range of the cast
+    // otherwise the CAST introduces some modulo-like behavior (*)
+    ColStatistics colStat = colStats.getFirst();
+    ColStatistics.Range range = colStat.getRange();
+    if (range == null)
+      return cast;
+    if (range.minValue == null || Double.isNaN(range.minValue.doubleValue()))
+      return cast;
+    if (range.maxValue == null || Double.isNaN(range.maxValue.doubleValue()))
+      return cast;
+
+    String type = cast.getType().getSqlTypeName().getName();
+
+    double min;
+    double max;
+    switch (type.toLowerCase()) {

Review Comment:
   I've refactored the switch and verified that the result of the getLimit call 
results in the same min/max values.
   
   I don't know whether there's a limit for FLOAT/DOUBLE, so I've created 
[CALCITE-7419](https://issues.apache.org/jira/browse/CALCITE-7419) for the 
discussion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to