aherbert commented on a change in pull request #140: Bloomfilter updates2
URL: 
https://github.com/apache/commons-collections/pull/140#discussion_r392117139
 
 

 ##########
 File path: 
src/main/java/org/apache/commons/collections4/bloomfilter/hasher/Shape.java
 ##########
 @@ -81,34 +83,36 @@
     private final HashFunctionIdentity hashFunctionIdentity;
 
     /**
-     * Constructs a filter configuration with the specified number of items and
-     * probability.
+     * Constructs a filter configuration with a desired false-positive 
probability ({@code p}) and the
+     * specified number of bits ({@code m}) and hash functions ({@code k}).
      *
-     * @param hashFunctionIdentity The HashFunctionIdentity of the hash 
function this shape uses.
-     * @param probability The probability of duplicates. Must be in the range
-     * (0.0,1.0).
-     * @param numberOfBits The number of bits in the filter.
-     * @param numberOfHashFunctions The number of hash functions in the filter.
+     * <p>The number of items ({@code n}) to be stored in the filter is 
computed.
+     * <pre>n = ceil(m / (-k / log(1 - exp(log(p) / k))))</pre>
+     *
+     * <p>The actual probability will be approximately equal to the
+     * desired probability but will be dependent upon the calculated Bloom 
filter capacity
+     * (number of items). An exception is raised if this is greater than or 
equal to 1 (i.e. the
+     * shape is invalid for use as a Bloom filter).
+     *
+     * @param hashFunctionIdentity The identity of the hash function this 
shape uses
+     * @param probability The desired false-positive probability in the range 
{@code (0, 1)}
+     * @param numberOfBits The number of bits in the filter
+     * @param numberOfHashFunctions The number of hash functions in the filter
+     * @throws NullPointerException if the hash function identity is null
+     * @throws IllegalArgumentException if the desired probability is not in 
the range {@code (0, 1)}
+     * @throws IllegalArgumentException if the number of bits is not above 8
 
 Review comment:
   OK. I have changed the check to number of bits must be above 0 to 
investigate the computations. This
   ```java
   // numberOfBits = 1
   // numberOfHashFunctions = 1;
   new Shape(testFunction, Math.nextDown(1.0), 1, 1);
   ```
   Creates a Shape with a number of items as 37 and a probability of 
0.9999999999999999.
   
   Using a number of bits of 4 creates a Shape with a number of items as 147 
and a probability of 0.9999999999999999.
   
   Using a number of bits of 8 creates a Shape with a number of items as 294 
and a probability of 0.9999999999999999.
   
   This is a bad case that will create a non-sense filter shape but the 
question is do we allow this? 
   
   It may be better to throw an exception when the calculated number of items 
is above the number of bits. If each bit represents a single item then it is 
impossible to filter more items than number of bits.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to