orlp commented on PR #221: URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2931447255
There is another issue with this proposal in my opinion: it adds semantics to the sign bit of `NaN`s. This is incredibly dangerous, not all data systems (e.g. Polars, but problems with this come all the way down to the hardware itself) consider NaN payloads or sign information on NaNs worth preserving. If for whatever reason the codepath computing the `min` statistic does not preserve `NaN` signs in exactly the same way as the `max` statistic you may end up in the scenario where both end up discarding the `NaN`, leading the statistics to state there are no NaNs when there very well could be. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
