asfimport commented on issue #407: URL: https://github.com/apache/parquet-format/issues/407#issuecomment-2184154170
[Gabor Szadovszky](https://issues.apache.org/jira/browse/PARQUET-2255?#comment-17699732) / @gszadovszky: But we don't build the dictionary for filtering but for encoding. We should not add anything else than what we have in the pages. So anything should be added to the read path. Maybe we do not need to handle +0.0 and -0.0 differently from the other values. (We needed to handle them separately for min/max values because the comparison is not trivial and there were actual issues.) If someone deals with FP numbers they should know about the difference between +0.0 and -0.0. Because the FP spec allows to have multiple NaN values (even though java use one actual bitmap for it) we need to avoid using Bloom filter in this case. Dictionary is a different thing because we deserialize it to java Double/Float values in a Set so we will have one NaN value that is the very same one we are searching for. (It is more for the other implementations to deal with NaN if the language has several NaN values.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
