Claudenw commented on a change in pull request #160:
URL: 
https://github.com/apache/commons-collections/pull/160#discussion_r426130303



##########
File path: 
src/test/java/org/apache/commons/collections4/bloomfilter/hasher/HasherBuilderTest.java
##########
@@ -29,7 +32,8 @@
 
 /**
  * Tests the
- * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder 
Hasher.Builder}.
+ * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
+ * Hasher.Builder}.

Review comment:
       The problem with converting T directly to a byte[] is two fold.
   
   First, to do the conversion to the byte[] you need to know the Shape.
   
   Second, it confuses the architecture.  In the architecture
   
   The Hasher is responsible for
   - hashing one or more items.
   - being able to replay the hash of the item(s).
   - converting the hash(es) into a collection of indexes into a bit vector.
   
   The Shape is responsible for: 
   - providing the number of bits that is represented by each item in the Bloom 
filter, also known as the number of hash functions, or _k_.
   - providing the length of the bit vector, also known as _m_.
   - providing the number of expected items in the Bloom filter, also known as 
_n_.
   - calculating the probability of false positive results, also known as _p_.
   
   The Bloom filter is responsible for:
   - the implementation and management of a logical bit vector of _m_ bits.
   - determining if another Bloom filter is contained in this filter. 
`contains( BloomFilter )`
   - determining if the items contained in Hasher, taken as a whole, are 
contained in this filter. `contains( Hasher )`.  This is equivalent to 
constructing a second Bloom filter and calling the above but without the 
overhead.
   - merging another bloom filter into this filter. `merge( BloomFilter )`
   - merging the items contained in a Hasher, taken as a whole, into this 
filter. `merge( Hasher )`.  As with the contains method this is equivalent to 
constructing a Bloom filter and calling the above.
   
   By using a function to go directly from T to byte[], the the function has to 
know the Shape of the filter that it is going to be used in.  Under the current 
architecture the Hasher does not care what the Shape is until the collection of 
indexes for a specific Shape are requested. 
   
   The SimpleHasher was chosen because it defines the underlying hash function 
(Murmur128x64Cyclic), any Hasher implementation would be an acceptable return 
value for the Function in the constructor.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to