Claudenw commented on PR #331:
URL: 
https://github.com/apache/commons-collections/pull/331#issuecomment-1242671100

   IndexProducers may return duplicates and make no order guarantees.  (there 
used to be an order guarantee but we removed that).
   
   Hasher based IndexProducers, by their nature, generally return unordered and 
possible duplicate values.  There is a hasher method to produce an 
IndexProducer that guaranteed uniqueness.
   
   BloomFilter based IndexProducers, by their nature, generally return ordered 
and unique values, though I can think of implementations where the order may 
not be true, we don't have one.
   
   The default implementation of IndexProducer uses BitSet in its 
implementation to simplify the code to produce the index list.  So the 
uniqueness is an artefact of the implementation.  If you have a fast 
implementation that can take the forEachIndex() and convert it to an array 
without imposing the uniqueness constraint then please implement that.
   
   In short I concur with your assessment.
   
   The HasherCollection is intended to simplify creation of some filters.  In 
practice it is that same as calling functions with hasher arguments once for 
each hasher in the collection.  Due to the difference between classes of Bloom 
filters (e.g. standard, counting, stable) there are times when the duplicates 
are required.
   
   HasherCollections work well in distributed systems where an object is 
represented in a Bloom filter by hashes of multiple properties.  A for query 
across the systems a HasherCollection is constructed and passed to the 
endpoints.  The endpoints can then build filters based on the shape at the 
endpoint.  So the specific back end systems do not have to agree on shape, but 
do agree on Hash algorithm.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to