aherbert commented on code in PR #331:
URL:
https://github.com/apache/commons-collections/pull/331#discussion_r953714695
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
* @see #isValid()
* @see #subtract(BitCountProducer)
*/
- boolean remove(Hasher hasher);
+ default boolean remove(final Hasher hasher) {
+ Objects.requireNonNull(hasher, "hasher");
+ return
subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+ }
+
+ /**
+ * Removes the values from the specified IndexProducer from the Bloom
filter from this Bloom filter.
+ *
+ * <p>Specifically all counts for the unique indices produced by the
{@code hasher} will be
+ * decremented by 1.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * <p>Node: This method expects index producers that produce unique
values.</p>
+ *
+ * @param indexProducer the IndexProducer to provide the indexes
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #subtract(BitCountProducer)
+ */
+ default boolean remove(final IndexProducer indexProducer) {
+ Objects.requireNonNull(indexProducer, "indexProducer");
+ try {
+ return subtract(BitCountProducer.from(indexProducer));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
+ }
+ }
+
+ /**
+ * Removes the specified BitMapProducer from this Bloom filter.
+ *
+ * <p>Specifically all counts for the indices produced by the {@code
bitMapProducer} will be
+ * decremented by 1.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param bitMapProducer the BitMapProducer to provide the indexes
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #subtract(BitCountProducer)
+ */
+ default boolean remove(final BitMapProducer bitMapProducer) {
+ return remove(IndexProducer.fromBitMapProducer(bitMapProducer));
Review Comment:
This double wraps the input: `BitMapProducer -> IndexProducer ->
BitCountProducer`. This could be a future optimisation to go direct:
`BitMapProducer -> BitCountProducer`. The method could be done in an internal
helper class.
##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java:
##########
@@ -97,8 +97,8 @@ public void testUniqueIndex() {
List<Integer> full =
Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
producer = hasher.uniqueIndices(shape);
List<Integer> unique =
Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
- assertTrue( full.size() > unique.size() );
- Set<Integer> set = new HashSet<Integer>( unique );
- assertEquals( set.size(), unique.size() );
+ assertTrue(full.size() > unique.size());
+ Set<Integer> set = new HashSet<Integer>(unique);
Review Comment:
Note: You can use the diamond operator `<>` in generics so you do not
require the `<Integer>`
##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -196,6 +196,7 @@ public AbstractDefaultBloomFilter copy() {
result.indices.addAll(indices);
return result;
}
+
Review Comment:
Remove the extra line
##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
*/
protected abstract T createEmptyFilter(Shape shape);
- /**
- * Create the BloomFilter implementation we are testing.
- *
- * @param shape the shape of the filter.
- * @param hasher the hasher to use to create the filter.
- * @return a BloomFilter implementation.
- */
- protected abstract T createFilter(Shape shape, Hasher hasher);
Review Comment:
Why not maintain these methods as non-abstract, e.g.
```Java
protected T createFilter(Shape shape, Hasher hasher) {
T f = createEmptyFilter(Shape shape);
f.merge(hasher);
return f;
}
```
Then you do not have to change all the existing tests to create and then
merge the argument.
##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -186,8 +179,10 @@ public final void testContains() {
@Test
public final void testEstimateIntersection() {
- final BloomFilter bf = createFilter(getTestShape(), from1);
- final BloomFilter bf2 = createFilter(getTestShape(), bigHasher);
+ final BloomFilter bf = createEmptyFilter(getTestShape());
+ bf.merge(from1);
+ final BloomFilter bf2 = createEmptyFilter(getTestShape());
+ bf2.merge(from1);
Review Comment:
`bf2` should merge `bigHasher`
##########
src/main/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilter.java:
##########
@@ -140,12 +78,9 @@ private boolean add(int idx) {
return true;
}
- /**
- * Performs a merge using an IndexProducer.
- * @param indexProducer the IndexProducer to merge from.
- * @throws IllegalArgumentException if producer sends illegal value.
- */
- private void merge(IndexProducer indexProducer) {
+ @Override
+ public boolean merge(IndexProducer indexProducer) {
+ Objects.requireNonNull(indexProducer, "indexProducer");
Review Comment:
Note:
This merge method (and others such as remove/add/subtract) throw exceptions.
However the filter will be left in a corrupted state if some of the indices
have been processed before the exception. Implementing these methods as an
atomic transaction would be slower and more involved. So I suggest that the
non-atomic nature of the bulk updates to a filter are commented, perhaps at the
package level, as a design decision to achieve maximum performance under
correct usage.
##########
src/test/java/org/apache/commons/collections4/bloomfilter/SetOperationsTest.java:
##########
@@ -49,52 +49,63 @@ private static void assertSymmetricOperation(double
expected, ToDoubleBiFunction
assertEquals(expected, operation.applyAsDouble(filter2, filter1),
"op(filter2, filter1)");
}
+ private BloomFilter forTest(Shape shape, Hasher hasher) {
Review Comment:
I would name these `createFilter` to be self documenting.
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -92,12 +180,15 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
* @see #isValid()
* @see #subtract(BitCountProducer)
*/
- boolean remove(BloomFilter other);
+ default boolean remove(final BloomFilter other) {
+ Objects.requireNonNull(other, "other");
+ return subtract(BitCountProducer.from(other));
Review Comment:
Note: This calls subtract with a BitCountProducer. But does not catch and
wrap IOOBE. Should it be:
```Java
return remove((IndexProducer) other);
```
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
// Modification Operations
+ /**
+ * Merges the specified Bloom filter into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
other} filter will be incremented by 1,</p>
+ *
+ * <p>Note: If the other filter is a counting Bloom filter the index
counts are ignored and it is treated as an
+ * IndexProducer.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param other the other Bloom filter
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final BloomFilter other) {
+ Objects.requireNonNull(other, "other");
Review Comment:
Could this be changed to:
```Java
return merge((IndexProducer) other);
```
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
* @see #isValid()
* @see #subtract(BitCountProducer)
*/
- boolean remove(Hasher hasher);
+ default boolean remove(final Hasher hasher) {
+ Objects.requireNonNull(hasher, "hasher");
+ return
subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));
Review Comment:
No catch of IOOBE and wrapping to IAE
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
// Modification Operations
+ /**
+ * Merges the specified Bloom filter into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
other} filter will be incremented by 1,</p>
+ *
+ * <p>Note: If the other filter is a counting Bloom filter the index
counts are ignored and it is treated as an
+ * IndexProducer.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param other the other Bloom filter
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final BloomFilter other) {
+ Objects.requireNonNull(other, "other");
+ try {
+ return add(BitCountProducer.from(other));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(e);
+ }
+ }
+
+ /**
+ * Merges the specified Hasher into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the unique indexes identified by the
{@code hasher} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param hasher the hasher
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final Hasher hasher) {
+ Objects.requireNonNull(hasher, "hasher");
+ try {
+ return
add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
+ }
+ }
+
+ /**
+ * Merges the specified index producer into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
indexProducer} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * <p>Note: This method expects and index producer that does not return
duplicates.</p>
+ *
+ * @param indexProducer the IndexProducer
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final IndexProducer indexProducer) {
+ Objects.requireNonNull(indexProducer, "producer");
+ try {
+ return add(BitCountProducer.from(indexProducer));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
Review Comment:
The caught exception should be added as the cause.
Note: the other default methods that catch and wrap the IOOBE do not add a
message but do add the cause. There should be consistency here. Perhaps change
all to use the same message and add the IOOBE as the cause.
##########
src/main/java/org/apache/commons/collections4/bloomfilter/SimpleBloomFilter.java:
##########
@@ -133,12 +81,9 @@ public SimpleBloomFilter copy() {
return new SimpleBloomFilter(this);
}
- /**
- * Performs a merge using an IndexProducer.
- * @param indexProducer the IndexProducer to merge from.
- * @throws IllegalArgumentException if producer sends illegal value.
- */
- private void merge(IndexProducer indexProducer) {
+ @Override
+ public boolean merge(IndexProducer indexProducer) {
+ Objects.requireNonNull(indexProducer, "indexProducer");
indexProducer.forEachIndex(idx -> {
if (idx < 0 || idx >= shape.getNumberOfBits()) {
throw new IllegalArgumentException(String.format(
Review Comment:
I note that here the index range uses a closed range [0, x-1]. Other
exception message use an open range on the upper bound [0, x).
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
// Modification Operations
+ /**
+ * Merges the specified Bloom filter into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
other} filter will be incremented by 1,</p>
+ *
+ * <p>Note: If the other filter is a counting Bloom filter the index
counts are ignored and it is treated as an
+ * IndexProducer.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param other the other Bloom filter
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final BloomFilter other) {
+ Objects.requireNonNull(other, "other");
+ try {
+ return add(BitCountProducer.from(other));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(e);
+ }
+ }
+
+ /**
+ * Merges the specified Hasher into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the unique indexes identified by the
{@code hasher} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param hasher the hasher
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final Hasher hasher) {
+ Objects.requireNonNull(hasher, "hasher");
+ try {
+ return
add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
+ }
+ }
+
+ /**
+ * Merges the specified index producer into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
indexProducer} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * <p>Note: This method expects and index producer that does not return
duplicates.</p>
+ *
+ * @param indexProducer the IndexProducer
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final IndexProducer indexProducer) {
+ Objects.requireNonNull(indexProducer, "producer");
+ try {
+ return add(BitCountProducer.from(indexProducer));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
+ }
+ }
+
+ /**
+ * Merges the specified BitMap producer into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
bitMapProducer} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param bitMapProducer the BitMapProducer
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final BitMapProducer bitMapProducer) {
+ return merge(IndexProducer.fromBitMapProducer(bitMapProducer));
Review Comment:
`Objects.requireNonNull(indexProducer, "bitMapProducer");` ?
Just to get the correct exception message rather than the message from the
IndexProducer.
##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter,
BitCountProducer {
// Modification Operations
+ /**
+ * Merges the specified Bloom filter into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the indexes identified by the {@code
other} filter will be incremented by 1,</p>
+ *
+ * <p>Note: If the other filter is a counting Bloom filter the index
counts are ignored and it is treated as an
+ * IndexProducer.</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param other the other Bloom filter
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final BloomFilter other) {
+ Objects.requireNonNull(other, "other");
+ try {
+ return add(BitCountProducer.from(other));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(e);
+ }
+ }
+
+ /**
+ * Merges the specified Hasher into this Bloom filter.
+ *
+ * <p>Specifically: all counts for the unique indexes identified by the
{@code hasher} will be incremented by 1,</p>
+ *
+ * <p>This method will return {@code true} if the filter is valid after
the operation.</p>
+ *
+ * @param hasher the hasher
+ * @return {@code true} if the removal was successful and the state is
valid
+ * @see #isValid()
+ * @see #add(BitCountProducer)
+ */
+ default boolean merge(final Hasher hasher) {
+ Objects.requireNonNull(hasher, "hasher");
+ try {
+ return
add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+ } catch (IndexOutOfBoundsException e) {
+ throw new IllegalArgumentException(
+ String.format("Filter only accepts values in the [0,%d)
range", getShape().getNumberOfBits()));
Review Comment:
The caught exception should be added as the cause
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]