[GitHub] [commons-collections] aherbert commented on a diff in pull request #406: COLLECTIONS-844 - allow counting Bloom filters with cell size other than Integer.SIZE

via GitHub Sat, 29 Jul 2023 04:05:56 -0700


aherbert commented on code in PR #406:
URL: 
https://github.com/apache/commons-collections/pull/406#discussion_r1278285658



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -121,7 +186,7 @@ default boolean merge(final Hasher hasher) {
     /**
      * Merges the specified index producer into this Bloom filter.
      *
-     * <p>Specifically: all counts for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>
+     * <p>Specifically: all cells for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>

Review Comment:
   If you wish to return a distinct array then the default method in the 
interface using a bit set will be fine.
   
   If you wish to return a count of duplicate indices then the current 
implementation will work if you just sort the array it currently creates. This 
may be faster than using a complicated intermediate data structure. The hasher 
is likely to have a small array and the speed of the sort will be fast.
   
   This may be an opportunity to use some more characteristics flags where an 
index producer can declare if the indices are sorted and or duplicates. Then 
the interface can provide default methods to remove duplicates or sort them.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [commons-collections] aherbert commented on a diff in pull request #406: COLLECTIONS-844 - allow counting Bloom filters with cell size other than Integer.SIZE

Reply via email to