A note about bloom filters in parquet-mr (the java implementation). Bloom filters are implemented and complete (on the master branch) but not yet released. I hope we can start the releasing process soon so we will have 1.12.0 containing bloom filters.
Cheers, Gabor On Tue, Jan 26, 2021 at 5:47 AM Micah Kornfield <[email protected]> wrote: > Welcome Vivianna, > I think taking a look at https://issues.apache.org/jira/browse/PARQUET-41 > and sub-issues should give you a sense of the current implementation. Java > seems to have an implementation. > > > The python implementation of parquet is a binding on top of the C++ > implementation. Bloom filters haven't been implemented in C++ to my > knowledge. > > Hope this helps. > > -Micah > > On Mon, Jan 25, 2021 at 9:05 AM Viviana Elizabeth Romero Noguera < > [email protected]> wrote: > > > Hi. > > I am a doctoral student at ICMC - USP in Brazil. > > I am looking to work with apache parquet. I am looking to program in java > > or python. > > Has the bloom filter already been implemented? at what level? row groups, > > column chunk or page? in what version of the parquet are they > implemented? > > > > I'm looking for ideas on what to contribute. > > > > Thanks. > > > > Viviana Noguera. > > >
