[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699712#comment-17699712 ]
Gabor Szadovszky commented on PARQUET-2255: ------------------------------------------- Bloom filters are for searching for exact values. Exact checking of floating point numbers are usually code smell. Usually checking if the difference is below an epsilon value is suggested over using exact equality. I am wondering if there is a real usecase for searching for an exact floating point number. Maybe disabling bloom filters completely for FP numbers is the simplest choice and probably won't bother anyone. If we still want to handle FP bloom filters I agree with [~wgtmac]'s proposal. (It is a similar approach we implemented for min/max values.) Keep in mind that we need to handle the case when someone wants to filter on a NaN. > BloomFilter and float point is ambiguous > ---------------------------------------- > > Key: PARQUET-2255 > URL: https://issues.apache.org/jira/browse/PARQUET-2255 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Reporter: Xuwei Fu > Priority: Major > Fix For: format-2.9.0 > > > Currently, our Parquet can use BloomFilter for any physical types. However, > when BloomFilter apply on float: > # What does +0 -0 means? Are they equal? > # Should qNaN sNaN written in BloomFilter? Are they equal? > -- This message was sent by Atlassian Jira (v8.20.10#820010)