andygrove opened a new issue, #4731:
URL: https://github.com/apache/datafusion-comet/issues/4731

   ### Describe the bug
   
   `AVG(<decimal>)` over a window falls back to Spark on Spark 4.x, even though 
`CometWindowExec` reports `AVG` as a natively supported window aggregate and 
`process_agg_func` has an `AvgDecimal` window branch.
   
   On Spark 4.x, `Average` of a decimal is represented in the physical plan as 
a `Cast(avg(UnscaledValue(child)) / pow10 AS decimal(p, s))` wrapping the 
`WindowExpression`, for example:
   
   ```
   Window [cast((avg(UnscaledValue(v#39)) windowspecdefinition(...) / 100.0) as 
decimal(14,6)) AS run_avg#58]
   ```
   
   `CometWindowExec.convert` only unwraps two shapes:
   
   ```scala
   case Alias(w: WindowExpression, _) => w
   case Alias(MakeDecimal(w: WindowExpression, _, _, _), _) => w
   case other => withFallbackReason(...); return None
   ```
   
   The `Cast(Divide(...))` form matches neither, so the whole `WindowExec` 
falls back to Spark. Results stay correct (Spark computes them), but:
   
   - The `AvgDecimal` native window branch in `process_agg_func` 
(`native/core/src/execution/planner.rs`) is effectively dead on Spark 4.x.
   - `AVG` decimal window aggregates never run natively on Spark 4.x, contrary 
to the coverage implied by #4209.
   
   ### Steps to reproduce
   
   ```sql
   statement
   CREATE TABLE dec_avg(g int, v decimal(10,2)) USING parquet
   
   statement
   INSERT INTO dec_avg VALUES (1, 10.10), (1, 20.25), (1, 30.33), (1, 41.00)
   
   -- runs natively on integer/double, falls back on decimal under Spark 4.x
   SELECT v, AVG(v) OVER (PARTITION BY g ORDER BY v
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS run_avg
   FROM dec_avg
   ```
   
   Comet falls back: `Unsupported window expression: 
cast((avg(UnscaledValue(v)) ... / 100.0) as decimal(14,6))`.
   
   ### Expected behavior
   
   `AVG(<decimal>)` over an ever-expanding window frame runs natively via the 
`AvgDecimal` UDAF and matches Spark, the same way `SUM(<decimal>)` already 
does. This needs `CometWindowExec.convert` to recognize the Spark 4.x 
`Cast(Divide(WindowExpression, ...))` average shape (in addition to the 
existing `MakeDecimal` shape).
   
   ### Additional context
   
   - Surfaced by an audit of the native window support added in #4209.
   - The `MakeDecimal` unwrap suggests decimal `AVG` is reachable on some Spark 
versions (likely 3.x); the divergent shape is the 4.x `Cast(Divide(...))` form. 
Behavior should be confirmed across 3.4 / 3.5 / 4.0 / 4.1 when fixing.
   - Once native decimal `AVG` is reachable, the sliding-frame overflow guard 
added for #4729 (which also covers `AVG`) becomes the relevant safeguard for 
sliding frames.
   - This is a coverage gap, not a correctness divergence: fallback produces 
correct results.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to