Re: [I] BloomFilter and float point is ambiguous [parquet-format]

via GitHub Sat, 22 Jun 2024 23:31:55 -0700


asfimport commented on issue #407:
URL: https://github.com/apache/parquet-format/issues/407#issuecomment-2184154170


   [Gabor 
Szadovszky](https://issues.apache.org/jira/browse/PARQUET-2255?#comment-17699732)
 / @gszadovszky:
   But we don't build the dictionary for filtering but for encoding. We should 
not add anything else than what we have in the pages. So anything should be 
added to the read path.
   
   Maybe we do not need to handle +0.0 and -0.0 differently from the other 
values. (We needed to handle them separately for min/max values because the 
comparison is not trivial and there were actual issues.) If someone deals with 
FP numbers they should know about the difference between +0.0 and -0.0. 
   
   Because the FP spec allows to have multiple NaN values (even though java use 
one actual bitmap for it) we need to avoid using Bloom filter in this case. 
Dictionary is a different thing because we deserialize it to java Double/Float 
values in a Set so we will have one NaN value that is the very same one we are 
searching for. (It is more for the other implementations to deal with NaN if 
the language has several NaN values.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] BloomFilter and float point is ambiguous [parquet-format]

Reply via email to