alamb opened a new issue, #182: URL: https://github.com/apache/parquet-site/issues/182
- Part of https://github.com/apache/parquet-site/issues/146 - Related to https://github.com/apache/parquet-format/pull/514 - Related to https://github.com/apache/parquet-format/pull/221 - Related to https://github.com/apache/parquet-format/pull/196 Thanks to the hard work of @JFinis, @wgtmac @etseidl and others we recently updated the Parquet spec to more clearly define how to handle floating point statistics (specifically Nans and ordering of -0.0, +0, etc I think we should make a post similar to https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/ that describes the backstory of the problem we are solving (in an approachable way) and then the solution that was introduced Here is one proposed outline 1. Summary of the feature 2. Background on Small Materialized Aggregates / statistics for pruning and parquet including the 3. Background on the wonders of Nans and floating point not having a defined ordering 4. The problems with the old definition for floating point stats (nicely summarized on https://github.com/apache/parquet-format/pull/514) 5. The new solution In addition to being a nice reference for implementers, I hope this post to continue the a story arc that highlights news addition we are making to Parquet spec, and and how they are being adopted (e.g. https://parquet.apache.org/docs/file-format/implementationstatus/) Also it would be a good idea to explain how we arrigved at consensus -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
