crepererum opened a new issue #255:
URL: https://github.com/apache/arrow-rs/issues/255


   **Describe the bug**
   NaN can occur in parquet statistics and override all other possible values. 
This is very similar to 
[PARQUET-1225](https://issues.apache.org/jira/browse/PARQUET-1225) which was 
filed for the C++ implementation.
   
   **To Reproduce**
   Add the following tests:
   
   ```rust
   #[test]
   fn test_float_statistics_nan_middle() {
       let stats = statistics_roundtrip::<FloatType>(&[1.0, f32::NAN, 2.0]);
       assert!(stats.has_min_max_set());
       if let Statistics::Float(stats) = stats {
           assert_eq!(stats.min(), &1.0);
           assert_eq!(stats.max(), &2.0);
       } else {
           panic!("expecting Statistics::Float");
       }
   }
   
   #[test]
   fn test_float_statistics_nan_start() {
       let stats = statistics_roundtrip::<FloatType>(&[f32::NAN, 1.0, 2.0]);
       assert!(stats.has_min_max_set());
       if let Statistics::Float(stats) = stats {
           assert_eq!(stats.min(), &1.0);
           assert_eq!(stats.max(), &2.0);
       } else {
           panic!("expecting Statistics::Float");
       }
   }
   
   #[test]
   fn test_float_statistics_nan_only() {
       let stats = statistics_roundtrip::<FloatType>(&[f32::NAN, f32::NAN]);
       assert!(!stats.has_min_max_set());
       assert!(matches!(stats, Statistics::Float(_)));
   }
   
   fn statistics_roundtrip<T: DataType>(values: &[<T as DataType>::T]) -> 
Statistics {
       let page_writer = get_test_page_writer();
       let props = Arc::new(WriterProperties::builder().build());
       let mut writer = get_test_column_writer::<T>(page_writer, 0, 0, props);
       writer.write_batch(values, None, None).unwrap();
   
       let (_bytes_written, _rows_written, metadata) = writer.close().unwrap();
       if let Some(stats) = metadata.statistics() {
           stats.clone()
       } else {
           panic!("metadata missing statistics");
       }
   }
   ```
   
   **Note that while the tests are written for `f32`/float, this also applies 
to `f64`/double.**
   
   **Expected behavior**
   NaNs should be ignored during stats calculation. If only NaNs are present 
then min and max value should be unset.
   
   **Additional context**
   Tested commit was `8f030db53d9eda901c82db9daf94339fc447d0db`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to