sjhddh opened a new issue, #22812:
URL: https://github.com/apache/datafusion/issues/22812

   ### Describe the bug
   
   The Spark-compatible `round()` function gives different results from Apache 
Spark when the input is a floating-point type (`FloatType`/`DoubleType`) and 
the value's binary representation is slightly off from its decimal literal.
   
   Spark's `RoundBase` rounds a double as `BigDecimal(d).setScale(scale, 
HALF_UP)`, where `BigDecimal(Double)` is `java.math.BigDecimal.valueOf(d)` — 
i.e. it parses the *shortest round-trip decimal string* of the double 
(`Double.toString`). DataFusion's `round_float` instead does naive binary-float 
arithmetic, `(value * 10^scale).round() / 10^scale`, which rounds the 
already-imprecise binary value and diverges at the half-way point.
   
   ### To Reproduce
   
   ```sql
   SELECT round(1.255::double, 2::int);
   -- Spark:      1.26
   -- DataFusion: 1.25
   
   SELECT round(1.005::double, 2::int);
   -- Spark:      1.01
   -- DataFusion: 1.0
   ```
   
   The cause is that `1.255` and `1.005` are stored as binary doubles a hair 
below the decimal value (`1.2549999999999999...`, `1.00499999999999989...`). 
Spark sees the shortest decimal string (`"1.255"`, `"1.005"`) and applies 
HALF_UP, so the tie rounds away from zero. DataFusion multiplies the raw binary 
value by `100`, which stays below the half-way point, and rounds down.
   
   ### Expected behavior
   
   Match Spark: round via the shortest round-trip decimal representation with 
HALF_UP (ties away from zero), for both `DoubleType` and `FloatType` (Spark 
widens float to double first via `f.toDouble`).
   
   ### Additional context
   
   The existing doc comment on `round_float` already describes the intended 
`BigDecimal` / HALF_UP behaviour; the implementation simply doesn't match it. I 
have a fix and will open a PR referencing this issue.
   
   `datafusion/spark/src/function/math/round.rs`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to