Jefffrey opened a new issue, #5047: URL: https://github.com/apache/arrow-rs/issues/5047
**Describe the bug** <!-- A clear and concise description of what the bug is. --> https://github.com/apache/parquet-format/blob/46cc3a0647d301bb9579ca8dd2cc356caf2a72d2/README.md?plain=1#L162-L178 ``` * FLOAT, DOUBLE - Signed comparison with special handling of NaNs and signed zeros. The details are documented in the [Thrift definition](src/main/thrift/parquet.thrift) in the `ColumnOrder` union. They are summarized here but the Thrift definition is considered authoritative: * NaNs should not be written to min or max statistics fields. * If the computed max value is zero (whether negative or positive), `+0.0` should be written into the max statistics field. * If the computed min value is zero (whether negative or positive), `-0.0` should be written into the min statistics field. For backwards compatibility when reading files: * If the min is a NaN, it should be ignored. * If the max is a NaN, it should be ignored. * If the min is +0, the row group may contain -0 values as well. * If the max is -0, the row group may contain +0 values as well. * When looking for NaN values, min and max should be ignored. ``` Specifically the points about the computed max and min values when they are negative/positive zero. **To Reproduce** <!-- Steps to reproduce the behavior: --> Add test to [parquet/src/column/writer/mod.rs](https://github.com/apache/arrow-rs/blob/91acfb07a9929a2d6721c5417e47c0c472372a86/parquet/src/column/writer/mod.rs): ```rust #[test] fn test_float_statistics_zero_only() { let stats = statistics_roundtrip::<FloatType>(&[0.0]); assert!(stats.has_min_max_set()); assert!(stats.is_min_max_backwards_compatible()); if let Statistics::Float(stats) = stats { assert_eq!(stats.min(), &-0.0); assert!(stats.min().is_sign_negative()); assert_eq!(stats.max(), &0.0); assert!(stats.max().is_sign_positive()); } else { panic!("expecting Statistics::Float"); } } ``` Run: ``` parquet$ cargo test --lib column::writer::tests::test_float_statistics_zero_only -- --nocapture --exact Finished test [unoptimized + debuginfo] target(s) in 0.06s Running unittests src/lib.rs (/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/deps/parquet-b466af98c7f74484) running 1 test thread 'column::writer::tests::test_float_statistics_zero_only' panicked at parquet/src/column/writer/mod.rs:2121:13: assertion failed: stats.min().is_sign_negative() note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace test column::writer::tests::test_float_statistics_zero_only ... FAILED failures: failures: column::writer::tests::test_float_statistics_zero_only test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 617 filtered out; finished in 0.00s error: test failed, to rerun pass `--lib` ``` **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> Test should succeed **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
