ritchie46 opened a new pull request #9416: URL: https://github.com/apache/arrow/pull/9416
@jorgecarleitao @Dandandan @nevi-me @alamb Before I continue on this I want to start a discussion on what the behavior of aggregation should be. I started this after I got this issue: https://github.com/ritchie46/polars/issues/328. It boils down to this: ```rust let a = Float64Array::from_iter_values(vec![1.0, f64::NAN]); dbg!(min(&a)); dbg!(max(&a)); ``` ``` [arrow/src/compute/kernels/aggregate.rs:905] min(&a) = Some( 1.0, ) [arrow/src/compute/kernels/aggregate.rs:906] max(&a) = Some( NaN, ) ``` I initially thought this was a bug, but then I see this behavior is asserted in the tests as being valid. However this is different behavior than that of most numerical tools I know (e.g. numpy, tensorflow, torch, etc.). [The IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard states that _"Any comparison with a NaN is treated as unordered."_. But if a max aggregation, differs from a min aggregation with regards to NaN this implies to me that we currently treat NaN as ordered and that NaN is larger than any number. IMO we should return NaN for both the max and the min kernel and may also add a variant that excludes the NaNs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org