[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677058#comment-17677058 ]
ASF GitHub Bot commented on PARQUET-2226: ----------------------------------------- yabola commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070637029 ########## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java: ########## @@ -394,4 +395,24 @@ public long hash(float value) { public long hash(Binary value) { return hashFunction.hashBytes(value.getBytes()); } + + @Override + public void merge(BloomFilter otherBloomFilter) throws IOException { + Preconditions.checkArgument(otherBloomFilter != null, "Cannot merge a null BloomFilter"); + Preconditions.checkArgument((getAlgorithm() == otherBloomFilter.getAlgorithm()), + String.format("BloomFilters must have the same algorithm (%s != %s)", + getAlgorithm(), otherBloomFilter.getAlgorithm())); + Preconditions.checkArgument((getHashStrategy() == otherBloomFilter.getHashStrategy()), + String.format("BloomFilters must have the same hashStrategy (%s != %s)", + getHashStrategy(), otherBloomFilter.getHashStrategy())); + Preconditions.checkArgument((getBitsetSize() == otherBloomFilter.getBitsetSize()), + String.format("BloomFilters must have the same size of bitsets (%s != %s)", + getBitsetSize(), otherBloomFilter.getBitsetSize())); + ByteArrayOutputStream otherOutputStream = new ByteArrayOutputStream(); + otherBloomFilter.writeTo(otherOutputStream); + byte[] otherBits = otherOutputStream.toByteArray(); Review Comment: I had checked the `getBitsetSize` before, so we may don't need to checke the bitset.length ? > Support merge Bloom Filter > -------------------------- > > Key: PARQUET-2226 > URL: https://issues.apache.org/jira/browse/PARQUET-2226 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > We need to collect Parquet's bloom filter of multiple files, and then > synthesize a more comprehensive bloom filter for common use. > Guava supports similar api operations > https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252 -- This message was sent by Atlassian Jira (v8.20.10#820010)