sjhddh opened a new pull request, #22813: URL: https://github.com/apache/datafusion/pull/22813
## Which issue does this PR close? Closes #22812. ## Rationale for this change `round_float` used naive binary-float arithmetic `(value * factor).round() / factor`, which diverges from Apache Spark's `RoundBase`. Spark evaluates `BigDecimal(d).setScale(scale, HALF_UP)`, and `BigDecimal(Double)` parses the shortest round-trip decimal string of the double. As a result: ```sql SELECT round(1.255::double, 2::int); -- Spark 1.26, was 1.25 SELECT round(1.005::double, 2::int); -- Spark 1.01, was 1.0 ``` ## What changes are included in this PR? - Reimplement `round_float` to mirror Spark: widen to `f64` (matching Spark's `f.toDouble` for `FloatType`), pass NaN/Inf through unchanged, then round via a `BigDecimal` built from the value's shortest-string representation with `RoundingMode::HalfUp` (ties away from zero). - `scale` is clamped to ±340 before constructing the decimal. A finite `f64` carries at most ~324 fractional digits and saturates above ~1e309, so any larger magnitude is already a no-op or collapses to zero. This also avoids an unbounded `10^scale` `BigInt` allocation on adversarial input like `round(x, i32::MAX)`. - Unit tests for the divergent cases, regression guards (`2.675`, `8.35`), negative values and scales, ties-away-from-zero, NaN/Inf pass-through, and bounded extreme scales. - sqllogictest coverage for the double path. ## Are these changes tested? Yes. New unit tests in `round.rs` and new cases in `spark/math/round.slt`. The full `datafusion-spark` test suite passes and `clippy` is clean locally. ## Are there any user-facing changes? `round()` on `float`/`double` now matches Spark at the half-way point. This is a correctness fix; results that previously diverged from Spark will change. Note on performance: the `BigDecimal` path is heavier than the prior multiply/divide. If maintainers prefer, I'm happy to add a fast path for the common small-scale case and fall back to `BigDecimal` only when needed — let me know your preference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
