Re: [PR] More decimal 32/64 support - type coercsion and misc gaps [datafusion]

via GitHub Sat, 18 Oct 2025 05:38:25 -0700


AdamGS commented on code in PR #17808:
URL: https://github.com/apache/datafusion/pull/17808#discussion_r2387513761



##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -955,28 +963,106 @@ pub fn decimal_coercion(lhs_type: &DataType, rhs_type: 
&DataType) -> Option<Data
 
     match (lhs_type, rhs_type) {
         // Prefer decimal data type over floating point for comparison 
operation
+        (Decimal32(_, _), Decimal32(_, _)) => get_wider_decimal_type(lhs_type, 
rhs_type),
+        (Decimal32(_, _), Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, 
_)) => {
+            get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+        }
+        (Decimal32(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
+        (Decimal64(_, _), Decimal64(_, _)) => get_wider_decimal_type(lhs_type, 
rhs_type),
+        (Decimal64(_, _), Decimal32(_, _) | Decimal128(_, _) | Decimal256(_, 
_)) => {
+            get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+        }
+        (Decimal64(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
         (Decimal128(_, _), Decimal128(_, _)) => {
             get_wider_decimal_type(lhs_type, rhs_type)
         }
+        (Decimal128(_, _), Decimal32(_, _) | Decimal64(_, _) | Decimal256(_, 
_)) => {
+            get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+        }
         (Decimal128(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
-        (_, Decimal128(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
         (Decimal256(_, _), Decimal256(_, _)) => {
             get_wider_decimal_type(lhs_type, rhs_type)
         }
+        (Decimal256(_, _), Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, 
_)) => {
+            get_wider_decimal_type_cross_variant(lhs_type, rhs_type)
+        }
         (Decimal256(_, _), _) => get_common_decimal_type(lhs_type, rhs_type),
+        (_, Decimal32(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
+        (_, Decimal64(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
+        (_, Decimal128(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
         (_, Decimal256(_, _)) => get_common_decimal_type(rhs_type, lhs_type),
         (_, _) => None,
     }
 }
 
+/// Handle cross-variant decimal widening by choosing the larger variant
+fn get_wider_decimal_type_cross_variant(
+    lhs_type: &DataType,
+    rhs_type: &DataType,
+) -> Option<DataType> {
+    use arrow::datatypes::DataType::*;
+
+    let (p1, s1) = match lhs_type {
+        Decimal32(p, s) => (*p, *s),
+        Decimal64(p, s) => (*p, *s),
+        Decimal128(p, s) => (*p, *s),
+        Decimal256(p, s) => (*p, *s),
+        _ => return None,
+    };
+
+    let (p2, s2) = match rhs_type {
+        Decimal32(p, s) => (*p, *s),
+        Decimal64(p, s) => (*p, *s),
+        Decimal128(p, s) => (*p, *s),
+        Decimal256(p, s) => (*p, *s),
+        _ => return None,
+    };
+
+    // max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2)
+    let s = s1.max(s2);
+    let range = (p1 as i8 - s1).max(p2 as i8 - s2);
+    let required_precision = (range + s) as u8;

Review Comment:
   I looked around a bit, and what I could find is:
   1. DataFusion already has multiple issues regarding cast overflow/precision 
loss (https://github.com/apache/datafusion/issues/16406, 
https://github.com/apache/datafusion/issues/13492), which I'm happy to take on 
but are unrelated here.
   2. Spark (which seems to be the main inspiration for this code) has a 
configuration to control how it handles these cases 
([here](https://github.com/apache/spark/blob/1c81ad20296d34f137238dadd67cc6ae405944eb/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L172)
 and 
[here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L427)).
   
   I'm not sure what's the desired behavior regarding precision loss (should it 
be configurable? Is there currently an accepted desired behavior?), I think for 
this PR it should be fine to just return `None` if the precision overflows, and 
take the bigger conversation into an issue where people can weigh in, and I'll 
be glad to take that forward. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] More decimal 32/64 support - type coercsion and misc gaps [datafusion]

Reply via email to