Re: [PR] Use Total Ordering for Aggregates and Refactor for Better Auto-Vectorization [arrow-rs]

via GitHub Tue, 21 Nov 2023 13:12:23 -0800


jhorstmann commented on code in PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#discussion_r1401197670



##########
arrow-array/src/arithmetic.rs:
##########
@@ -377,9 +437,16 @@ macro_rules! native_type_float_op {
     };
 }
 
-native_type_float_op!(f16, f16::ZERO, f16::ONE);
-native_type_float_op!(f32, 0., 1.);
-native_type_float_op!(f64, 0., 1.);
+native_type_float_op!(
+    f16,
+    f16::ZERO,
+    f16::ONE,
+    f16::from_bits(f16::NAN.to_bits() ^ 0x8000),
+    f16::NAN,
+    u16
+);
+native_type_float_op!(f32, 0., 1., -f32::NAN, f32::NAN, u32);

Review Comment:
   So the canonical `f32::NAN` is not the largest NAN according to `total_cmp`. 
Its bit pattern is `7fc00000` and the following asserts all pass:
   
   ```rust
           let max_bits = f32::from_bits(i32::MAX as _);
           assert!(max_bits.is_nan());
           assert!(max_bits.is_sign_positive());
   
           let min_bits = f32::from_bits(-1 as _);
           assert!(min_bits.is_nan());
           assert!(min_bits.is_sign_negative());
   
           assert!(min_bits.total_cmp(&-f32::NAN).is_lt());
           assert!(max_bits.total_cmp(&f32::NAN).is_gt())
   ```
   
   So we should probably use these bit patterns as identities. Using the 
canonical values as identity could have one benefit, it would normalize the 
output of the min/max kernels to a canonical NaN if there are multiple NaN 
values with different bit patterns. How are different NaN values handled 
elsewhere, for example in the group by implementation of datafusion, would they 
be considered as separate groups? If so, we should probably also distinguish 
them here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Use Total Ordering for Aggregates and Refactor for Better Auto-Vectorization [arrow-rs]

Reply via email to