yabola commented on code in PR #1043:
URL: https://github.com/apache/parquet-mr/pull/1043#discussion_r1145564119
##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##########
@@ -97,7 +97,7 @@ abstract class ColumnWriterBase implements ColumnWriter {
int optimalNumOfBits =
BlockSplitBloomFilter.optimalNumOfBits(ndv.getAsLong(), fpp.getAsDouble());
this.bloomFilter = new BlockSplitBloomFilter(optimalNumOfBits / 8,
maxBloomFilterSize);
} else {
- this.bloomFilter = new BlockSplitBloomFilter(maxBloomFilterSize);
+ this.bloomFilter = BlockSplitBloomFilter.of(maxBloomFilterSize);
Review Comment:
> If I understand something wrong, please correct me. I found this part of
code, need `bitset.length / BYTES_PER_BLOCK` result to be integer, need power
of 2, extra bit is useless, but also no error.
@wgtmac @gszadovszky Sorry, I may have misunderstood. `bitset.length` is
valid as long as it is a multiple of `BYTES_PER_BLOCK`(32) (not necessarily a
power of 2) .
The reason for the power of 2 here is more complicated, I think there are
performance reasons.
Most of the existing parquet uses need to specify the NDV value or default
maxBytes 1MB. In these cases, `numbytes` will be always initialized to a power
of 2.
@chenjunjiedada could you take a look at this? thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]