aherbert commented on code in PR #335:
URL: 
https://github.com/apache/commons-collections/pull/335#discussion_r1003576587


##########
src/main/java/org/apache/commons/collections4/bloomfilter/Hasher.java:
##########
@@ -49,7 +49,8 @@ public interface Hasher {
      * Creates an IndexProducer of unique indices for this hasher based on the 
Shape.
      *
      * <p>This is like the `indices(Shape)` method except that it adds the 
guarantee that no
-     * duplicate values will be returned</p>
+     * duplicate values will be returned.  The indices produced are equivalent 
those returned

Review Comment:
   `equivalent to those`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of 
the contained

Review Comment:
   delete `of` before comprising



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of 
the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values 
from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom 
filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>
+     *
+     * <p>A BitCountProducer generated from this IndexProducer is equivalent 
to a BitCountProducer from a
+     * counting Bloom filter that was constructed from the contained 
hashers.<p>

Review Comment:
   IIUC this is only true if the counting Bloom filter was constructed from 
unique indices from the contained hashers. So perhaps change to: `constructed 
from the contained hashers unique indices.`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends 
AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),

Review Comment:
   This tests a saturation of the number of bits. We should really have a test 
for non saturation, no collisions just to make sure that works as well.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -141,29 +172,16 @@ public boolean forEachIndex(IntPredicate consumer) {
 
         @Override
         public int[] asIndexArray() {
-            List<int[]> lst = new ArrayList<>();
-            int[] count = new int[1];
-            /*
-             * This method needs to return duplicate indices
-             */
-            for (Hasher hasher : hashers) {
-                int[] ary = hasher.indices(shape).asIndexArray();
-                lst.add(ary);
-                count[0] += ary.length;
-            }
-            if (lst.isEmpty()) {
-                return new int[0];
-            }
-            if (lst.size() == 1) {
-                return lst.get(0);
-            }
-            int[] result = new int[count[0]];
-            int offset = 0;
-            for (int[] ary : lst) {
-                System.arraycopy(ary, 0, result, offset, ary.length);
-                offset += ary.length;
-            }
-            return result;
+            int[] result = new 
int[shape.getNumberOfHashFunctions()*hashers.size()];

Review Comment:
   This is a much better implementation.
   
   Can you add whitespace around the multiply ` * `.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends 
AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 
2)).absoluteUniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new 
HasherCollection().uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return 0;

Review Comment:
   Should this have flags for distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromUniqueHasherCollectionTest extends 
AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 2)).uniqueIndices(Shape.fromKM(11, 
10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return 
BitCountProducer.from(NullHasher.INSTANCE.uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered

Review Comment:
   `// HasherCollection allows ...`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasherTest.java:
##########
@@ -35,6 +36,12 @@ protected Hasher createEmptyHasher() {
         return NullHasher.INSTANCE;
     }
 
+

Review Comment:
   Remove extra line



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of 
the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values 
from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom 
filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>

Review Comment:
   returning an IndexProducer with the concatenated output indices from each 
filter.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends 
AbstractIndexProducerTest {

Review Comment:
   Remove the 2 from the name suffix



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();

Review Comment:
   You have not removed the redundant part of this test. The distinctness is 
tested in `testBehaviourAsIndexArray`.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -106,6 +120,23 @@ public boolean forEachIndex(IntPredicate consumer) {
         };
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices across all 
the contained
+     * hashers.
+     *
+     * <p>This is equivalent to an IndexProducer created from a Bloom filter 
that comprises all
+     * the contained hashers.</p>
+     *
+     * @param shape the shape of the desired Bloom filter.
+     * @return the iterator of integers
+     */
+    public IndexProducer absoluteUniqueIndices(final Shape shape) {
+        return consumer -> {
+            Objects.requireNonNull(consumer, "consumer");
+            return uniqueIndices(shape).forEachIndex(IndexFilter.create(shape, 
consumer));

Review Comment:
   IIUC this can exceed the capacity of the IndexFilter. The 
uniqueIndices(shape) will output up to k indices for each hasher in the 
collection (eliminating duplicates). Thus if we have two hashers with 
shape(k=5, n=1000) we could output 10 indices from the combined unique indices. 
Then we create an IndexFilter for shape which could be optimised only to check 
up to k=5 indices.
   
   This throws an index out of bounds exception from IndexFilter.ArrayTracker.
   
   ```Java
   @Test
   public void testAbsoluteUniqueIndices() {
       int[] actual = new HasherCollection(
           new IncrementingHasher(1, 1),
           new IncrementingHasher(10, 1)
       ).absoluteUniqueIndices(Shape.fromKM(5, 1000)).asIndexArray();
       int[] expected = IntStream.concat(
               IntStream.range(1, 1 + 5),
               IntStream.range(10, 10 + 5)
           ).toArray();
       Assertions.assertArrayEquals(expected, actual);
   }
   ```
   
   When I update HasherCollection absoluteUniqueIndices to use a fake shape 
that forces the IndexTracker.BitMapTracker implementation then the above tests 
passes:
   ```Java
   public IndexProducer absoluteUniqueIndices(final Shape shape) {
       return consumer -> {
           Objects.requireNonNull(consumer, "consumer");
           return uniqueIndices(shape).forEachIndex(IndexFilter.create(
                   Shape.fromKM(shape.getNumberOfBits(), 
shape.getNumberOfBits()), consumer));
       };
   }
   ```
   
   I think that this can be easily solved by having a new package-private 
factory constructor in IndexFilter to choose the implementation that can handle 
exceeding the number of hash functions in the shape:
   
   ```Java
   static IntPredicate create(int numberOfBits, IntPredicate consumer) {
       // Choose the BitMapTracker implementation to handle any number of hash 
functions
   ```
   
   Or you could create a new Shape as:
   ```Java
   public IndexProducer absoluteUniqueIndices(final Shape shape) {
       return consumer -> {
           Objects.requireNonNull(consumer, "consumer");
           return uniqueIndices(shape).forEachIndex(IndexFilter.create(
                   Shape.fromKM(shape.getNumberOfHashFunctions() * 
hashers.size(),
                                shape.getNumberOfBits()), consumer));
       };
   }
   ```
   Perhaps the later is simpler.
   
   
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/UniqueIndexProducerFromHasherCollectionTest.java:
##########
@@ -36,4 +36,11 @@ protected int getBehaviour() {
         // index from each hasher. The result is there may still be duplicates.
         return 0;
     }
+
+

Review Comment:
   Remove extra line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends 
AbstractIndexProducerTest {
+
+    // selecting 11 items from the range [0,9] will cause a collision
+    private Shape shape = Shape.fromKM(11, 10);
+
+    @Override
+    protected IndexProducer createProducer() {
+        return new HasherCollection(new IncrementingHasher(1, 1), new 
IncrementingHasher(2, 2)).absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected IndexProducer createEmptyProducer() {
+        return new HasherCollection().absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // HasherCollection allows duplicates and may be unordered

Review Comment:
   Update comment as duplicates are not allowed from absoluteUniqueIndices.
   
   Q. Should forEach also be distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherTest.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromUniqueHasherTest extends 
AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new IncrementingHasher(4, 
8).uniqueIndices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return 
BitCountProducer.from(NullHasher.INSTANCE.indices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return AS_ARRAY_DISTINCT;

Review Comment:
   Update the comment as this does not allow duplicates
   
   Q. Is forEach also distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+

Review Comment:
   Remove empty line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to