mustafasrepo commented on code in PR #5384:
URL: https://github.com/apache/arrow-datafusion/pull/5384#discussion_r1117404087
##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -426,95 +426,128 @@ impl ExprRewriter for TypeCoercionRewriter {
}
}
-/// Casts the ScalarValue `value` to coerced type.
-// When coerced type is `Interval` we use `parse_interval` since
`try_from_string` not
-// supports conversion from string to Interval
-fn convert_to_coerced_type(
- coerced_type: &DataType,
- value: &ScalarValue,
-) -> Result<ScalarValue> {
+/// Casts the given `value` to `target_type`. Note that this function
+/// only considers `Null` or `Utf8` values.
+fn coerce_scalar(target_type: &DataType, value: &ScalarValue) ->
Result<ScalarValue> {
match value {
- // In here we do casting either for NULL types or
- // ScalarValue::Utf8(Some(val)). The other types are already casted.
- // The reason is that we convert the sqlparser result
- // to the Utf8 for all possible cases. Hence the types other than Utf8
- // are already casted to appropriate type. Therefore they can be
returned directly.
+ // Coerce Utf8 values:
ScalarValue::Utf8(Some(val)) => {
- // we need special handling for Interval types
- if let DataType::Interval(..) = coerced_type {
+ // When `target_type` is `Interval`, we use `parse_interval` since
+ // `try_from_string` does not support `String` to `Interval`
coercions.
+ if let DataType::Interval(..) = target_type {
parse_interval("millisecond", val)
} else {
- ScalarValue::try_from_string(val.clone(), coerced_type)
+ ScalarValue::try_from_string(val.clone(), target_type)
}
}
s => {
if s.is_null() {
- ScalarValue::try_from(coerced_type)
+ // Coerce `Null` values:
+ ScalarValue::try_from(target_type)
} else {
+ // Values except `Utf8`/`Null` variants already have the right
type
+ // (casted before) since we convert `sqlparser` outputs to
`Utf8`
+ // for all possible cases. Therefore, we return a clone here.
Ok(s.clone())
}
}
}
}
+/// This function coerces `value` to `target_type` in a range-aware fashion.
+/// If the coercion is successful, we return an `Ok` value with the result.
+/// If the coercion fails because `target_type` is not wide enough (i.e. we
+/// can not coerce to `target_type`, but we can to a wider type in the same
+/// family), we return a `Null` value of this type to signal this situation.
+/// Downstream code uses this signal to treat these values as *unbounded*.
+fn coerce_scalar_range_aware(
+ target_type: &DataType,
+ value: &ScalarValue,
+) -> Result<ScalarValue> {
+ coerce_scalar(target_type, value).or_else(|err| {
+ // If type coercion fails, check if the largest type in family works:
+ if let Some(largest_type) = get_widest_type_in_family(target_type) {
+ coerce_scalar(largest_type, value).map_or_else(
+ |_| {
+ Err(DataFusionError::Execution(format!(
+ "Cannot cast {:?} to {:?}",
+ value, target_type
+ )))
+ },
+ |_| ScalarValue::try_from(target_type),
Review Comment:
Consider the case when target type is `Int8` and range value used in the
query is 10000 like following: `OVER(ORDER BY c1 RANGE BETWEEN 10000 PRECEDING
AND 10000 FOLLOWING)`). 10000 cannot be casted to `Int8`. In this case we try
to cast value to `Int64` type. If it succeeds it means that 10000 couldn't be
casted to `Int8` because of overflow in the first case. It means that we can
treat the `OVER(ORDER BY c1 RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING)`
as `OVER(ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING)` without loss of generality, since we know that range would cover
the whole table. In this case we are not interested in the casted version (We
ignore the result. Then return `NULL` to convert range to `UNBOUNDED`. By the
way we also use `NULL` for the types in the form `ScalarValue::Int8(None)` not
just for the types `ScalarValue::Null` in case that terminology is
misleading.). We just need to know if value can be casted to large type. If
large
type casting fails also it means that some strange range entered not
compatible with the original column like following: `OVER(ORDER BY c1 RANGE
BETWEEN '3 day' PRECEDING AND '1 hour' FOLLOWING)`. In that case we return
error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]