AdamGS commented on code in PR #17808:
URL: https://github.com/apache/datafusion/pull/17808#discussion_r2387513761
##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -955,28 +963,106 @@ pub fn decimal_coercion(lhs_type: &DataType, rhs_type:
&DataType) -> Option<Data
match (lhs_type, rhs_type) {
// Prefer decimal data type over floating point for comparison
operation
+ (Decimal32(_, _), Decimal32(_, _)) => get_wider_decimal_type(lhs_type,
rhs_type),
+ (Decimal32(_, _), Decimal64(_, _) | Decimal128(_, _) | Decimal256(_,
_)) => {
+ get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+ }
+ (Decimal32(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
+ (Decimal64(_, _), Decimal64(_, _)) => get_wider_decimal_type(lhs_type,
rhs_type),
+ (Decimal64(_, _), Decimal32(_, _) | Decimal128(_, _) | Decimal256(_,
_)) => {
+ get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+ }
+ (Decimal64(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
(Decimal128(_, _), Decimal128(_, _)) => {
get_wider_decimal_type(lhs_type, rhs_type)
}
+ (Decimal128(_, _), Decimal32(_, _) | Decimal64(_, _) | Decimal256(_,
_)) => {
+ get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+ }
(Decimal128(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
- (_, Decimal128(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
(Decimal256(_, _), Decimal256(_, _)) => {
get_wider_decimal_type(lhs_type, rhs_type)
}
+ (Decimal256(_, _), Decimal32(_, _) | Decimal64(_, _) | Decimal128(_,
_)) => {
+ get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+ }
(Decimal256(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
+ (_, Decimal32(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
+ (_, Decimal64(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
+ (_, Decimal128(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
(_, Decimal256(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
(_, _) => None,
}
}
+/// Handle cross-variant decimal widening by choosing the larger variant
+fn get_wider_decimal_type_cross_variant(
+ lhs_type: &DataType,
+ rhs_type: &DataType,
+) -> Option<DataType> {
+ use arrow::datatypes::DataType::*;
+
+ let (p1, s1) = match lhs_type {
+ Decimal32(p, s) => (*p, *s),
+ Decimal64(p, s) => (*p, *s),
+ Decimal128(p, s) => (*p, *s),
+ Decimal256(p, s) => (*p, *s),
+ _ => return None,
+ };
+
+ let (p2, s2) = match rhs_type {
+ Decimal32(p, s) => (*p, *s),
+ Decimal64(p, s) => (*p, *s),
+ Decimal128(p, s) => (*p, *s),
+ Decimal256(p, s) => (*p, *s),
+ _ => return None,
+ };
+
+ // max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2)
+ let s = s1.max(s2);
+ let range = (p1 as i8 - s1).max(p2 as i8 - s2);
+ let required_precision = (range + s) as u8;
Review Comment:
I looked around a bit, and what I could find is:
1. DataFusion already has multiple issues regarding cast overflow/precision
loss (https://github.com/apache/datafusion/issues/16406,
https://github.com/apache/datafusion/issues/13492), which I'm happy to take on
but are unrelated here.
2. Spark (which seems to be the main inspiration for this code) has a
configuration to control how it handles these cases
([here](https://github.com/apache/spark/blob/1c81ad20296d34f137238dadd67cc6ae405944eb/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L172)
and
[here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L427)).
I'm not sure what's the desired behavior regarding precision loss (should it
be configurable? Is there currently an accepted desired behavior?), I think for
this PR it should be fine to just return `None` if the precision overflows, and
take the bigger conversation into an issue where people can weigh in, and I'll
be glad to take that forward. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]