[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703052#comment-17703052
]
ASF GitHub Bot commented on PARQUET-2260:
-----------------------------------------
wgtmac commented on code in PR #1043:
URL: https://github.com/apache/parquet-mr/pull/1043#discussion_r1142934039
##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##########
@@ -97,7 +97,7 @@ abstract class ColumnWriterBase implements ColumnWriter {
int optimalNumOfBits =
BlockSplitBloomFilter.optimalNumOfBits(ndv.getAsLong(), fpp.getAsDouble());
this.bloomFilter = new BlockSplitBloomFilter(optimalNumOfBits / 8,
maxBloomFilterSize);
} else {
- this.bloomFilter = new BlockSplitBloomFilter(maxBloomFilterSize);
+ this.bloomFilter = BlockSplitBloomFilter.of(maxBloomFilterSize);
Review Comment:
https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java#L146
It tries to find the next power of two and also guarantees not to exceed the
max size.
BTW, the specs does not require the size to be power of two. Is there any
issue you have seen if the size is not power of two?
> Bloom filter bytes size shouldn't be larger than maxBytes size in the
> configuration
> ------------------------------------------------------------------------------------
>
> Key: PARQUET-2260
> URL: https://issues.apache.org/jira/browse/PARQUET-2260
> Project: Parquet
> Issue Type: Bug
> Reporter: Mars
> Assignee: Mars
> Priority: Major
>
> Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a
> power of 2 value, the size of the bloom filter generated will exceed this
> value. For example, now if set {{parquet.bloom.filter.max.bytes}} as 1024 *
> 1024+1= 1048577 , the bytes size of bloom filter generated will be 1024 *
> 1024 * 2 = 2097152. This does not match the definition of the parameter
> After this PR: set this value to the largest power of two less than
> {{parquet.bloom.filter.max.bytes}} and It should be 1024 * 1024
--
This message was sent by Atlassian Jira
(v8.20.10#820010)