yabola commented on code in PR #1043:
URL: https://github.com/apache/parquet-mr/pull/1043#discussion_r1152144788
##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##########
@@ -97,7 +97,7 @@ abstract class ColumnWriterBase implements ColumnWriter {
int optimalNumOfBits =
BlockSplitBloomFilter.optimalNumOfBits(ndv.getAsLong(), fpp.getAsDouble());
this.bloomFilter = new BlockSplitBloomFilter(optimalNumOfBits / 8,
maxBloomFilterSize);
} else {
- this.bloomFilter = new BlockSplitBloomFilter(maxBloomFilterSize);
+ this.bloomFilter = BlockSplitBloomFilter.of(maxBloomFilterSize);
Review Comment:
I refer to the documentation here
https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach
and learned that the implementation in parquet is based on the paper
http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf
Impala has the same implementation but i'm not familiar with it... I saw
similar implementation logic here.
https://github.com/apache/impala/blob/2c779939dc302be9ee5dd97ddf374bb043040891/be/src/kudu/util/block_bloom_filter.h#L88-L92
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]