[ 
https://issues.apache.org/jira/browse/PARQUET-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061641#comment-17061641
 ] 

Gabor Szadovszky commented on PARQUET-1815:
-------------------------------------------

If one would like to use bloom filters out of the very scope of parquet-mr 
(e.g. to union the bloom filters of several files for a partition of a table) 
then I think providing the interface for the bloom filter is not a good idea. 
E.g. Iceberg supports the file formats Avro, Parquet and Orc. Orc also has its 
own implementation for bloom filters. If we would like to support this example 
scenario in Iceberg, it would be better to use a common interface for bloom 
filters that is not part of the Parquet API.

I am not against implementing this functionality in parquet-mr (it is not a 
complex one anyway), I've just missed a usecase and I think it is a bit early 
to implement such functionality without a driver case.

> Add union API to BloomFilter interface
> --------------------------------------
>
>                 Key: PARQUET-1815
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1815
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Junjie Chen
>            Priority: Minor
>              Labels: pull-request-available
>
> Sometimes, one may want to build a file-level bloom filter by union all row 
> groups bloom filters so that to save some memory. Add a union API that could 
> make it easy to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to