sjhddh opened a new pull request, #22813:
URL: https://github.com/apache/datafusion/pull/22813

   ## Which issue does this PR close?
   
   Closes #22812.
   
   ## Rationale for this change
   
   `round_float` used naive binary-float arithmetic `(value * factor).round() / 
factor`, which diverges from Apache Spark's `RoundBase`. Spark evaluates 
`BigDecimal(d).setScale(scale, HALF_UP)`, and `BigDecimal(Double)` parses the 
shortest round-trip decimal string of the double. As a result:
   
   ```sql
   SELECT round(1.255::double, 2::int);  -- Spark 1.26, was 1.25
   SELECT round(1.005::double, 2::int);  -- Spark 1.01, was 1.0
   ```
   
   ## What changes are included in this PR?
   
   - Reimplement `round_float` to mirror Spark: widen to `f64` (matching 
Spark's `f.toDouble` for `FloatType`), pass NaN/Inf through unchanged, then 
round via a `BigDecimal` built from the value's shortest-string representation 
with `RoundingMode::HalfUp` (ties away from zero).
   - `scale` is clamped to ±340 before constructing the decimal. A finite `f64` 
carries at most ~324 fractional digits and saturates above ~1e309, so any 
larger magnitude is already a no-op or collapses to zero. This also avoids an 
unbounded `10^scale` `BigInt` allocation on adversarial input like `round(x, 
i32::MAX)`.
   - Unit tests for the divergent cases, regression guards (`2.675`, `8.35`), 
negative values and scales, ties-away-from-zero, NaN/Inf pass-through, and 
bounded extreme scales.
   - sqllogictest coverage for the double path.
   
   ## Are these changes tested?
   
   Yes. New unit tests in `round.rs` and new cases in `spark/math/round.slt`. 
The full `datafusion-spark` test suite passes and `clippy` is clean locally.
   
   ## Are there any user-facing changes?
   
   `round()` on `float`/`double` now matches Spark at the half-way point. This 
is a correctness fix; results that previously diverged from Spark will change.
   
   Note on performance: the `BigDecimal` path is heavier than the prior 
multiply/divide. If maintainers prefer, I'm happy to add a fast path for the 
common small-scale case and fall back to `BigDecimal` only when needed — let me 
know your preference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to