tustvold commented on PR #221: URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2943289604
To echo what has been said before, and reiterated by @JFinis the total order mechanism is a relatively straightforward fix that is not only easy to implement but also easy to **explain to users**, if it even needs explaining. A NaN just becomes a very big value, and a negative NaN a very small value. Yes, this means a NaN may limit the ability to prune statistics, but this is no different from an abnormally large or small value. IMO this is pretty intuitive. That's not to say the nan counts proposal is without merit, I could definitely see it being useful for engines that order NaNs differently, however, it is more complex both for users and parquet readers to reason about. IMO we have a proposal with relatively broad consensus, and multiple implementations, I think it is worthwhile bring it to a broader audience in the form of a vote. I don't see adopting total ordering as a one way door, we can always add a nan count mechanism later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
