andygrove opened a new issue, #4482:
URL: https://github.com/apache/datafusion-comet/issues/4482
## Describe the bug
Spark documents that `array_max` and `array_min` treat NaN as greater than
any non-NaN value for `Float`/`Double` element arrays (Spark uses
`SQLOrderingUtil.compareFloats`/`compareDoubles`). DataFusion's
`array_max`/`array_min` go through Arrow's `partial_cmp`-based kernels, which
produce IEEE semantics where NaN comparisons are unordered.
For arrays containing NaN, the two implementations produce different results:
- `array_max(array(double('NaN'), 1.0, 2.0))` returns `NaN` in Spark, may
return `2.0` or `NULL` in Comet depending on kernel behaviour.
- `array_min(array(double('NaN'), 1.0, 2.0))` returns `1.0` in both, but the
Comet path is fragile.
Surfaced by the array-expressions audit (collection PR queue). The single
covering literal test in `CometArrayExpressionSuite` uses
`array(double('-Infinity'), 0.0, double('Infinity'))` and does not contain a
NaN, so the divergence is currently uncaught by CI.
## Steps to reproduce
```sql
SELECT array_max(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
-- Spark: NaN
-- Comet: varies (likely 2.0 or NULL)
SELECT array_min(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
-- Spark: 1.0
-- Comet: varies
```
## Expected behavior
Either implement Spark's NaN ordering on the Comet side or downgrade
`array_max` / `array_min` to `Incompatible(Some(...))` for `FloatType` /
`DoubleType` element arrays so they only run with
`spark.comet.expression.ArrayMax.allowIncompatible=true` (and the matching
`ArrayMin` flag).
## Additional context
- Comet serdes: `CometArrayMax`, `CometArrayMin` in
`spark/src/main/scala/org/apache/comet/serde/arrays.scala`.
- Spark reference: `ArrayMax.evalInternal` / `ArrayMin.evalInternal` in
`collectionOperations.scala`; uses `getInterpretedOrdering` which routes
through `SQLOrderingUtil` for floats and doubles.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]