aherbert commented on code in PR #406:
URL:
https://github.com/apache/commons-collections/pull/406#discussion_r1278285658
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -121,7 +186,7 @@ default boolean merge(final Hasher hasher) {
/**
* Merges the specified index producer into this Bloom filter.
*
- * <p>Specifically: all counts for the indexes identified by the {@code
indexProducer} will be incremented by 1.</p>
+ * <p>Specifically: all cells for the indexes identified by the {@code
indexProducer} will be incremented by 1.</p>
Review Comment:
If you wish to return a distinct array then the default method in the
interface using a bit set will be fine.
If you wish to return a count of duplicate indices then the current
implementation will work if you just sort the array it currently creates. This
may be faster than using a complicated intermediate data structure. The hasher
is likely to have a small array and the speed of the sort will be fast.
This may be an opportunity to use some more characteristics flags where an
index producer can declare if the indices are sorted and or duplicates. Then
the interface can provide default methods to remove duplicates or sort them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]