andygrove opened a new issue, #4729:
URL: https://github.com/apache/datafusion-comet/issues/4729
### Describe the bug
`SUM(<decimal>)` over a **sliding** window frame (any frame whose lower
bound is not `UNBOUNDED PRECEDING`) returns a wrapped, out-of-range
`Decimal128` value when the running sum overflows the result decimal precision,
instead of Spark's `NULL`.
Native execution routes sliding-frame decimal `SUM` to DataFusion's built-in
`sum`, whose accumulator uses `add_wrapping` / `sub_wrapping`
(`datafusion-functions-aggregate` `sum.rs`) and never applies Spark's
decimal-overflow-to-`NULL` semantics. Ever-expanding frames (lower bound
`UNBOUNDED PRECEDING`) correctly use Comet's `SumDecimal` UDAF and return
`NULL` on overflow, matching Spark.
The Rust code already acknowledges this gap in a comment in
`process_agg_func` (`native/core/src/execution/planner.rs`):
> For sliding frames, those UDAFs can't be used (no retract_batch), so
delegate to DataFusion's built-in `sum`, which supports retract but doesn't
enforce Spark's decimal precision overflow-to-NULL.
but `CometWindowExec.convert` does not fall back to Spark for this case, so
the wrong result reaches the user. The wrapped value is even outside the
declared `Decimal(38, 0)` range, which makes Spark's result decoding throw
`EXPRESSION_DECODING_FAILED` / `NUMERIC_VALUE_OUT_OF_RANGE` when collecting the
Comet result.
This is decimal-specific. Bigint sliding `SUM` overflow matches Spark (both
wrap in two's complement under non-ANSI), and ever-expanding decimal `SUM`
overflow matches Spark (both `NULL`).
### Steps to reproduce
```sql
statement
CREATE TABLE dec_ovf(g int, v decimal(38,0)) USING parquet
statement
INSERT INTO dec_ovf VALUES
(1, 90000000000000000000000000000000000000),
(1, 90000000000000000000000000000000000000),
(1, 90000000000000000000000000000000000000)
-- sliding frame (lower bound = CURRENT ROW): routes to DataFusion built-in
sum
query
SELECT v, SUM(v) OVER (PARTITION BY g ORDER BY v
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS tail_sum
FROM dec_ovf
```
Comet produces a wrapped value such as
`-160282366920938463463374607431768211456`, which then fails result decoding:
```
org.apache.spark.SparkRuntimeException: [EXPRESSION_DECODING_FAILED] ...
Caused by: org.apache.spark.SparkArithmeticException:
[NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION]
The -160282366920938463463374607431768211456 rounded half up ... cannot be
represented as Decimal(38, 0).
```
### Expected behavior
Comet returns `NULL` for window rows whose decimal sum overflows the result
precision, matching Spark non-ANSI semantics (and throwing under ANSI). The
most direct fix is to fall back to Spark for decimal `SUM` with a
non-ever-expanding frame in `CometWindowExec`, mirroring the existing
RANGE-frame DATE/DECIMAL fallbacks.
### Additional context
- Surfaced by an audit of the native window support added in #4209.
- The non-overflow sliding decimal `SUM` path is correct and should stay
native; only the overflow case diverges.
- The same DataFusion-built-in routing applies to sliding decimal `AVG`, but
on Spark 4.x decimal `AVG` over a window currently falls back to Spark for an
unrelated serde reason, so it is not reachable there. It may be reachable on
Spark 3.x and should be checked as part of the fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]