yabola commented on code in PR #1043:
URL: https://github.com/apache/parquet-mr/pull/1043#discussion_r1152144788


##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##########
@@ -97,7 +97,7 @@ abstract class ColumnWriterBase implements ColumnWriter {
       int optimalNumOfBits = 
BlockSplitBloomFilter.optimalNumOfBits(ndv.getAsLong(), fpp.getAsDouble());
       this.bloomFilter = new BlockSplitBloomFilter(optimalNumOfBits / 8, 
maxBloomFilterSize);
     } else {
-      this.bloomFilter = new BlockSplitBloomFilter(maxBloomFilterSize);
+      this.bloomFilter = BlockSplitBloomFilter.of(maxBloomFilterSize);

Review Comment:
   I refer to the documentation here 
https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach
 and learned that the implementation in parquet is based on the paper 
http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf 
   
   
   Impala has the same implementation but i'm not familiar with it... I saw 
similar implementation logic here.
   
https://github.com/apache/impala/blob/2c779939dc302be9ee5dd97ddf374bb043040891/be/src/kudu/util/block_bloom_filter.h#L88-L92
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to