jhorstmann commented on code in PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#discussion_r1401197670
##########
arrow-array/src/arithmetic.rs:
##########
@@ -377,9 +437,16 @@ macro_rules! native_type_float_op {
};
}
-native_type_float_op!(f16, f16::ZERO, f16::ONE);
-native_type_float_op!(f32, 0., 1.);
-native_type_float_op!(f64, 0., 1.);
+native_type_float_op!(
+ f16,
+ f16::ZERO,
+ f16::ONE,
+ f16::from_bits(f16::NAN.to_bits() ^ 0x8000),
+ f16::NAN,
+ u16
+);
+native_type_float_op!(f32, 0., 1., -f32::NAN, f32::NAN, u32);
Review Comment:
So the canonical `f32::NAN` is not the largest NAN according to `total_cmp`.
Its bit pattern is `7fc00000` and the following asserts all pass:
```rust
let max_bits = f32::from_bits(i32::MAX as _);
assert!(max_bits.is_nan());
assert!(max_bits.is_sign_positive());
let min_bits = f32::from_bits(-1 as _);
assert!(min_bits.is_nan());
assert!(min_bits.is_sign_negative());
assert!(min_bits.total_cmp(&-f32::NAN).is_lt());
assert!(max_bits.total_cmp(&f32::NAN).is_gt())
```
So we should probably use these bit patterns as identities. Using the
canonical values as identity could have one benefit, it would normalize the
output of the min/max kernels to a canonical NaN if there are multiple NaN
values with different bit patterns. How are different NaN values handled
elsewhere, for example in the group by implementation of datafusion, would they
be considered as separate groups? If so, we should probably also distinguish
them here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]