Re: [PR] fix: Spark-compatible HALF_UP rounding for round() on float types [datafusion]

via GitHub Tue, 09 Jun 2026 08:20:30 -0700


comphead commented on code in PR #22813:
URL: https://github.com/apache/datafusion/pull/22813#discussion_r3381811607



##########
datafusion/spark/src/function/math/round.rs:
##########
@@ -187,20 +190,43 @@ fn get_scale(args: &[ColumnarValue]) -> 
Result<Option<i32>> {
 /// round_float(125.0, -1) → 130.0
 /// ```
 fn round_float<T: num_traits::Float>(value: T, scale: i32) -> T {
-    if scale >= 0 {
-        let factor = T::from(10.0f64.powi(scale)).unwrap_or_else(T::infinity);
-        if factor.is_infinite() {
-            // Very large positive scale — value is already precise enough, 
return as-is
-            return value;
-        }
-        (value * factor).round() / factor
-    } else {
-        let factor = T::from(10.0f64.powi(-scale)).unwrap_or_else(T::infinity);
-        if factor.is_infinite() {
-            // Very large negative scale — any finite value rounds to 0
-            return T::zero();
-        }
-        (value / factor).round() * factor
+    // Widen to f64 first. For f32 inputs this matches Spark's `f.toDouble`
+    // step (FloatType: `BigDecimal(f.toDouble).setScale(..).toFloat`), which
+    // exposes the binary-float error before rounding. For f64 it is a no-op.
+    let Some(d) = value.to_f64() else {
+        return value;
+    };
+
+    // Spark returns NaN / ±Inf unchanged; BigDecimal cannot represent them.
+    if !d.is_finite() {
+        return value;
+    }
+
+    // `d.to_string()` produces the shortest round-trip decimal string, 
matching
+    // Scala's `BigDecimal(d) = java.math.BigDecimal.valueOf(d)` semantics. So
+    // `round(1.255_f64, 2)` parses "1.255" and rounds to 1.26 (not the naive
+    // binary-float 1.25).
+    let Ok(bd) = BigDecimal::from_str(&d.to_string()) else {
+        // Should not happen for a finite f64, but fall back gracefully.
+        return value;
+    };
+
+    // A finite f64 carries at most ~324 fractional decimal digits and 
saturates
+    // below ~1e309 in magnitude, so any `scale` past those bounds is already a
+    // no-op (large positive) or collapses the value to zero (large negative).
+    // Clamp before `with_scale_round` so adversarial input such as
+    // `round(x, i32::MAX)` cannot drive an unbounded `10^scale` BigInt
+    // allocation. The clamp is exact for every finite f64.
+    let clamped_scale = i64::from(scale).clamp(-340, 340);
+

Review Comment:
   Good observation, Spark 4.1.2 has ANSI mode ON by default, in Datafusion we 
just started to support it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Spark-compatible HALF_UP rounding for round() on float types [datafusion]

Reply via email to