andygrove opened a new issue, #4482:
URL: https://github.com/apache/datafusion-comet/issues/4482

   ## Describe the bug
   
   Spark documents that `array_max` and `array_min` treat NaN as greater than 
any non-NaN value for `Float`/`Double` element arrays (Spark uses 
`SQLOrderingUtil.compareFloats`/`compareDoubles`). DataFusion's 
`array_max`/`array_min` go through Arrow's `partial_cmp`-based kernels, which 
produce IEEE semantics where NaN comparisons are unordered.
   
   For arrays containing NaN, the two implementations produce different results:
   
   - `array_max(array(double('NaN'), 1.0, 2.0))` returns `NaN` in Spark, may 
return `2.0` or `NULL` in Comet depending on kernel behaviour.
   - `array_min(array(double('NaN'), 1.0, 2.0))` returns `1.0` in both, but the 
Comet path is fragile.
   
   Surfaced by the array-expressions audit (collection PR queue). The single 
covering literal test in `CometArrayExpressionSuite` uses 
`array(double('-Infinity'), 0.0, double('Infinity'))` and does not contain a 
NaN, so the divergence is currently uncaught by CI.
   
   ## Steps to reproduce
   
   ```sql
   SELECT array_max(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
   -- Spark:  NaN
   -- Comet:  varies (likely 2.0 or NULL)
   
   SELECT array_min(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
   -- Spark:  1.0
   -- Comet:  varies
   ```
   
   ## Expected behavior
   
   Either implement Spark's NaN ordering on the Comet side or downgrade 
`array_max` / `array_min` to `Incompatible(Some(...))` for `FloatType` / 
`DoubleType` element arrays so they only run with 
`spark.comet.expression.ArrayMax.allowIncompatible=true` (and the matching 
`ArrayMin` flag).
   
   ## Additional context
   
   - Comet serdes: `CometArrayMax`, `CometArrayMin` in 
`spark/src/main/scala/org/apache/comet/serde/arrays.scala`.
   - Spark reference: `ArrayMax.evalInternal` / `ArrayMin.evalInternal` in 
`collectionOperations.scala`; uses `getInterpretedOrdering` which routes 
through `SQLOrderingUtil` for floats and doubles.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to