aherbert commented on code in PR #335:
URL: 
https://github.com/apache/commons-collections/pull/335#discussion_r1002411748


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,46 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assumptions.assumeTrue;
 
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.apache.commons.collections4.bag.TreeBag;
 import 
org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
+import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends 
AbstractIndexProducerTest {
 
+    /** Flag to indicate the {@link 
BitCountProducer#forEachCount(BitCountConsumer)} is ordered. */
+    protected static final int FOR_EACH_COUNT_ORDERED = 0x10;

Review Comment:
   In the interest of simplicity, can these be dropped?
   
   I previously stated that we should only have them if the behaviour of a 
BitCountProducer will be different from an IndexProducer. From what I can see 
in the tests these are used in 3 tests, and only when `FOR_EACH_ORDERED` and 
`FOR_EACH_DISTINCT` from the AbstractIndexProducerTest are also used. Dropping 
these constants will reduce maintenance and ensure the flags are correctly set 
for BitCount and Index producers.
   
   You can add a note in the test to state that the same flags are reused.
   
   I think the simplest way to do this is to update the flags to be private:
   ```Java
   /** Flag to indicate the {@link 
BitCountProducer#forEachCount(BitCountConsumer)} is ordered.
    * This flag currently reuses the value from IndexProducer. At present no 
implementations
    * exhibit different behaviour for the two interfaces. This may change in 
the future and
    * an explicit flag value will be required. */
   private static final int FOR_EACH_COUNT_ORDERED = FOR_EACH_ORDERED;
   /** Flag to indicate the {@link 
BitCountProducer#forEachCount(BitCountConsumer)} is distinct.
    * This flag currently reuses the value from IndexProducer. At present no 
implementations
    * exhibit different behaviour for the two interfaces. This may change in 
the future and
    * an explicit flag value will be required. */
   private static final int FOR_EACH_COUNT_DISTINCT = FOR_EACH_DISTINCT;
   ```
   An alternative would be to have a flag for unspecified behaviour. This must 
be explicitly set by implementing tests:
   ```
   FOR_EACH_COUNT_UNSPECIFIED = 0x10;
   FOR_EACH_COUNT_ORDERED = 0x20;
   FOR_EACH_COUNT_DISTINCT = 0x40;
   ```
   The abstract test can then check if the behaviour is unspecified. If true 
then the other flags must be unset. If false then at least one of the other 
flags must be set. I think this is more work and not required, especially since 
we are testing the behaviour that is not a part of the contract given in the 
javadoc of the main implementations. This is more of a behaviour sense check of 
the implementations.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +72,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 
1.

Review Comment:
   I was considering why do we need to do this: convert an IndexProducer to a 
BitCountProducer. It would be to add it to a CountingBloomFilter. In this case 
the conversion does not remove duplicate indices. This means that the 
BitCountProducer can increment the count of an index more than 1 when adding a 
single IndexProducer.
   
   This has the following consequences:
   - The count at each index of a CountingBloomFilter does not represent the 
number of individual hashes added to the filter that had that index enabled. 
The count is bounded by the number of hashes as a lower limit, i.e. it may be 
higher.
   - Removal of an IndexProducer from a counting bloom filter must use the same 
method to construct the BitCountProducer as the addition in order to use the 
exact same index->count mapping.
   
   I am fine with this behaviour. Filtering an IndexProducer to unique indices 
is extra work. The `IndexFilter` class helps with this but requires a `Shape`. 
So it cannot be done in the BitCountProducer interface which has no Shape; 
other mechanisms to filter the IndexProducer without a bound on the index would 
be inefficient.
   
   So perhaps we require a bit of extra documentation here to serve as a 
user-beware warning:
   ```
    * <p>Note that the BitCountProducer does not remove duplicates. Any use of 
the
    * BitCountProducer to create an aggregate mapping of index to counts, such 
as a
    * CountingBloomFilter, should use the same BitCountProducer in both add and
    * subtract operations to maintain consistency.
   ```
   I believe this is consistent with the new wording in the javadoc header for 
the BitCountProducer interface.
   
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +71,89 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not 
implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should 
be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be 
true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be 
called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
+
+    @Test
+    public void testForEachCountValues() {
+        // Assumes the collections bag works. Could be replaced with 
Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], 
c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty 
should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty 
should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), 
"empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), 
"empty should be true");
+    @Test
+    public final void testBehaviourForEachCount() {

Review Comment:
   Add a javadoc note:
   ```Java
   /**
    * Test the behaviour of {@link 
BitCountProducer#forEachCount(BitCountConsumer)} with respect
    * to ordered and distinct indices. Currently the behaviour is assumed to be 
the same as
    * {@link IndexProducer#forEachIndex(java.util.function.IntPredicate)}.
    */
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not 
implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should 
be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be 
true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be 
called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty 
should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty 
should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), 
"empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), 
"empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with 
Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], 
c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();

Review Comment:
   As before, assumeTrue should be in the line after you get the flags.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to