[GitHub] [commons-collections] Claudenw commented on a diff in pull request #406: COLLECTIONS-844 - allow counting Bloom filters with cell size other than Integer.SIZE

via GitHub Sat, 29 Jul 2023 05:31:00 -0700


Claudenw commented on code in PR #406:
URL: 
https://github.com/apache/commons-collections/pull/406#discussion_r1278292441



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -121,7 +186,7 @@ default boolean merge(final Hasher hasher) {
     /**
      * Merges the specified index producer into this Bloom filter.
      *
-     * <p>Specifically: all counts for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>
+     * <p>Specifically: all cells for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>

Review Comment:
   If we have IndexProducer.asArray output the indices in the order that they 
would be returned by IndexProducer.forEachIndex() then we are not changing the 
data based on the representation.
   
   This also means that sorting can be done by generating an array, using 
Array.sort() and then using the result to create an 
IndexProducer.fromIndexArray() to create the sorted index.
   
   The characteristics flag may be a good idea. We already have them in the 
test code.
   
   How about we move this forward as is and merge it.  I think the change is 
large enough (I was trying to keep is small).
   
   I still need to get the LayeredBloom filter added and it will need to be 
updated once we have this in place. 
   Finally, I have the StableBloom filter, which may clarify some of the 
questions we have a bout characteristics and whether we have chosen the proper 
restrictions on IndexProducer and CellProducers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #406: COLLECTIONS-844 - allow counting Bloom filters with cell size other than Integer.SIZE

Reply via email to