[ 
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699712#comment-17699712
 ] 

Gabor Szadovszky commented on PARQUET-2255:
-------------------------------------------

Bloom filters are for searching for exact values. Exact checking of floating 
point numbers are usually code smell. Usually checking if the difference is 
below an epsilon value is suggested over using exact equality. I am wondering 
if there is a real usecase for searching for an exact floating point number. 
Maybe disabling bloom filters completely for FP numbers is the simplest choice 
and probably won't bother anyone.

If we still want to handle FP bloom filters I agree with [~wgtmac]'s proposal. 
(It is a similar approach we implemented for min/max values.) Keep in mind that 
we need to handle the case when someone wants to filter on a NaN.



> BloomFilter and float point is ambiguous
> ----------------------------------------
>
>                 Key: PARQUET-2255
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2255
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Xuwei Fu
>            Priority: Major
>             Fix For: format-2.9.0
>
>
> Currently, our Parquet can use BloomFilter for any physical types. However, 
> when BloomFilter apply on float:
>  # What does +0 -0 means? Are they equal?
>  # Should qNaN sNaN written in BloomFilter? Are they equal?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to