Claudenw commented on a change in pull request #160:
URL:
https://github.com/apache/commons-collections/pull/160#discussion_r426130303
##########
File path:
src/test/java/org/apache/commons/collections4/bloomfilter/hasher/HasherBuilderTest.java
##########
@@ -29,7 +32,8 @@
/**
* Tests the
- * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
Hasher.Builder}.
+ * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
+ * Hasher.Builder}.
Review comment:
The problem with converting T directly to a byte[] is two fold.
First, to do the conversion to the byte[] you need to know the Shape.
Second, it confuses the architecture. In the architecture
The Hasher is responsible for
- hashing one or more items.
- being able to replay the hash of the item(s).
- converting the hash(es) into a collection of indexes into a bit vector.
The Shape is responsible for:
- providing the number of bits that is represented by each item in the Bloom
filter, also known as the number of hash functions, or _k_.
- providing the length of the bit vector, also known as _m_.
- providing the number of expected items in the Bloom filter, also known as
_n_.
- calculating the probability of false positive results, also known as _p_.
The Bloom filter is responsible for:
- the implementation and management of a logical bit vector of _m_ bits.
- determining if another Bloom filter is contained in this filter.
`contains( BloomFilter )`
- determining if the items contained in Hasher, taken as a whole, are
contained in this filter. `contains( Hasher )`. This is equivalent to
constructing a second Bloom filter and calling the above but without the
overhead.
- merging another bloom filter into this filter. `merge( BloomFilter )`
- merging the items contained in a Hasher, taken as a whole, into this
filter. `merge( Hasher )`. As with the contains method this is equivalent to
constructing a Bloom filter and calling the above.
By using a function to go directly from T to byte[], the the function has to
know the Shape of the filter that it is going to be used in. Under the current
architecture the Hasher does not care what the Shape is until the collection of
indexes for a specific Shape are requested.
The SimpleHasher was chosen because it defines the underlying hash function
(Murmur128x64Cyclic), any Hasher implementation would be an acceptable return
value for the Function in the constructor.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]