[ 
https://issues.apache.org/jira/browse/COLLECTIONS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052059#comment-18052059
 ] 

Gary D. Gregory edited comment on COLLECTIONS-883 at 1/15/26 11:52 AM:
-----------------------------------------------------------------------

[~claude] or [~aherbert], any thoughts here?

I think this is more of a feature request than a bug since our implementation 
is based on {{int}} and arrays, which are limited in size to 
{{Integer.MAX_VALUE}} or {{Integer.MAX_VALUE - 8}}, depending on the underlying 
JVM.


was (Author: garydgregory):
[~claude] or [~aherbert], any thoughts here?

> BloomFilter Shape class limits numberOfBits to int, preventing large-scale 
> filters (>2.1B bits)
> -----------------------------------------------------------------------------------------------
>
>                 Key: COLLECTIONS-883
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-883
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Bloomfilter
>    Affects Versions: 4.5.0
>            Reporter: Ayush Sharma
>            Priority: Major
>         Attachments: code.png, error.png
>
>
> *Problem*
> When creating a Bloom filter for large datasets using Shape.fromNP(n, p), the 
> operation fails if the calculated number of bits exceeds Integer.MAX_VALUE 
> (~2.1 billion).
> *Error Message*
> Resulting filter has more than 2147483647 bits: 7.569340059E9
> *Environment*
> * Dataset: ~500 million elements
> * False Positive Probability: 0.005
> * Apache Commons Collections version: 4.5.0-M2
> *Root Cause*
> The Shape class stores numberOfBits as an int:
> * Shape.fromNP(int n, double p)
> * Shape.fromKM(int k, int m)  
> * Shape.fromNM(int n, int m)
> * Shape.getNumberOfBits() returns int
> For large-scale applications, the required bits can exceed Integer.MAX_VALUE.
> *Calculation*
> m = -n × ln(p) / (ln(2))²
> m = -500,000,000 × ln(0.005) / 0.4805
> m ≈ 5.5+ billion bits
> *Suggested Fix*
> Change numberOfBits from int to long in the Shape class and related methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to