Re: [PR] Support different scales for decimal binary math functions [datafusion]

via GitHub Wed, 11 Feb 2026 15:07:39 -0800


theirix commented on PR #19874:
URL: https://github.com/apache/datafusion/pull/19874#issuecomment-3887735901


   Hello @Jefffrey, 
   
   I agree, there aren't many functions that support both decimal types. Most 
likely, it will be only `gcd` and `lcm`, aside from user-defined functions. For 
other functions, the changes are mostly neutral.
   
   The rationale for this PR is to support these symmetric functions for both 
decimal types and avoid scaling issues, as shown below. 
   
   This implementation of `gcd` (`lcm` can be done similarly) with this PR is 
shown here -
   
https://github.com/apache/datafusion/compare/f9ffc8242c3c253efdc3cd73e2a9bebbe6de0393...theirix:datafusion:gcd-decimal-new-api
 . It uses a new API and utilises a new arrow num_traits support, so decimals 
are treated as any other numbers.
   
   When I tried to [implement it using the old 
function](https://github.com/theirix/datafusion/commit/0768bcdcaa2c1ab915de3da3b8724c35edcf994a),
 I immediately encountered a problem with the old function 
`calculate_binary_decimal_math` because it doesn't take scale into account, so 
the result is incorrect. Shortly, it failed with `select gcd(2::decimal(38, 0), 
3::decimal(38, 0));`, with a cast of a proper input decimal (`3::Decimal`) to a 
raw arrow type `Decimal128Type`, abruptly adding default scale 
`DECIMAL_DEFAULT_SCALE=10`, so it becomes `30000000000`. It didn't happen if 
the second argument is a non-decimal number (pow, log, etc). Logs:
   <details>
   ```
   [2026-02-09T22:17:42Z INFO  datafusion_functions::math::gcd] invoke gcd with 
PrimitiveArray<Decimal128(38, 0)>
       [
         2,
       ] and Scalar(Decimal128(Some(3),38,0))
   [2026-02-09T22:17:42Z INFO  datafusion_functions::utils] Calculating binary 
math with left PrimitiveArray<Decimal128(38, 0)>
       [
         2,
       ] and right Scalar(Decimal128(Some(30000000000),38,10))
   [2026-02-09T22:17:42Z INFO  datafusion_functions::math::gcd] 
euclid_gcd_unsigned a=2 b=30000000000 -> Ok(2)
   [2026-02-09T22:17:42Z INFO  datafusion_functions::utils] Calculating binary 
math with result PrimitiveArray<Decimal128(38, 10)>
       [
         2,
       ]
   ```
   </details>
   
   This implementation avoids this bug by properly analysing scale, so 
rescaling shouldn't be handled in the UDF code. If you check the implementation 
closely, it uses rescaling only for the decimal-decimal case. Other cases have 
code almost identical to the old function. 
   
   I can see two approaches for implementing decimal-decimal math functions:
   
   1. (as in this PR). Fix the utility function to be scale-aware to prevent 
correctness issues. It has more code, but handles all cases properly with 
casting, scaling, and error checking. UDFs remain simple.
   
   2. Handle it per-UDF. Move this logic to decimal-decimal UDFs (`gcd`, 
`lcm`). It keeps the utils function lean. At the same time, it adds more 
complex logic on the UDF side, for both of them. It may also lead to similar 
bugs and edge cases.
   
   I favour the first approach for consistency, but I'm happy to refactor with 
the second approach if you find it more suitable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support different scales for decimal binary math functions [datafusion]

Reply via email to