tustvold commented on PR #221:
URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2943289604

   To echo what has been said before, and reiterated by @JFinis the total order 
mechanism is a relatively straightforward fix that is not only easy to 
implement but also easy to **explain to users**, if it even needs explaining. A 
NaN just becomes a very big value, and a negative NaN a very small value. Yes, 
this means a NaN may limit the ability to prune statistics, but this is no 
different from an abnormally large or small value. IMO this is pretty intuitive.
   
   That's not to say the nan counts proposal is without merit, I could 
definitely see it being useful for engines that order NaNs differently, 
however, it is more complex both for users and parquet readers to reason about.
   
   IMO we have a proposal with relatively broad consensus, and multiple 
implementations, I think it is worthwhile bring it to a broader audience in the 
form of a vote. I don't see adopting total ordering as a one way door, we can 
always add a nan count mechanism later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to