alamb opened a new issue, #8156: URL: https://github.com/apache/arrow-rs/issues/8156
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - Related to https://github.com/apache/parquet-format/issues/406 @JFinis has been working on a proposal to better store statistics for floating point values in Parquet. The most recent proposal is here - https://github.com/apache/parquet-format/pull/514 In order to change the format, there needs to be at least 2 open source implementations of a proposal There is also some question (see this link from @tustvold ) about how complex this would be to implement / get right. **Describe the solution you'd like** I would like to implement a draft of the specification in https://github.com/apache/parquet-format/pull/514 in arrow-rs to show it is possible and keep the Rust implementation on the leading edge of implementation. **Describe alternatives you've considered** - @etseidl has implemented the IEEE 754 total order in a draft PR here: https://github.com/apache/arrow-rs/pull/7408 We would also need to implement the `nan_count` field along with filtering out nans when writing statistics for floats. Some good tests would be to 1. Write floating point data (specified below) to a parquet file 2. Read the metadata back and verify min/max values and `nan_count` for the following cases 2. A column with no Nan values, 3. A column with a single +Nan value (should not appear in stats) 4. A column with a single -Nan value (should not appear in stats) 5. A column of *Only* Nan values 6. A column with Inf and some +/- Nans 7. A column with -Inf and some +/- Nans **Additional context** * Original JIRA issue: https://issues.apache.org/jira/browse/PARQUET-2249 * Mailing list discussion: https://lists.apache.org/thread/lzh0dvrvnsy8kvflvl61nfbn6f9js81s -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org