aherbert commented on a change in pull request #160:
URL: 
https://github.com/apache/commons-collections/pull/160#discussion_r426131444



##########
File path: 
src/test/java/org/apache/commons/collections4/bloomfilter/hasher/HasherBuilderTest.java
##########
@@ -29,7 +32,8 @@
 
 /**
  * Tests the
- * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder 
Hasher.Builder}.
+ * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
+ * Hasher.Builder}.

Review comment:
       I think I was not clear. The `byte[]` is just the part of the item T 
that you want to hash. So you have two situations where you convert the item T 
to either:
   
   1. T -> List<byte[]>
   2. T -> byte[]
   
   These are then fed into the DynamicHasher. In case 1 you have a byte[] for 
each property or part of T you want to add to the Bloom filter. In case 2 you 
are adding the entire item T as a single entity.
   
   You could group 1 and 2 into the same situation where I concatenate the 
parts that would make up the List<byte[]> to a single longer array and create a 
SimpleBuilder containing the result. The idea is the same: a single item T 
should only add one set of indices to the Bloom filter.
   
   I do not suggest by-passing the Hasher. I suggest making it easier to 
specify case 2. Thus as a user I can convert my item to a binary form `byte[]` 
then the SimpleBloomFilter would do the rest of the work to hash the binary 
form and create a sequence of indexes for the filter shape.
   
   This is why I suggested a `Function<T, byte[]>` version. The `byte[]` would 
be used internally by the SimpleBloomFilter to create a Hasher and generate 
indices for the Shape. Essentially this would be:
   
   ```java
   byte[] b = ...;
   Hasher h = new SimpleBuilder().with(b).build();
   ```
   
   But this could be done using a specialised route that avoids all the object 
creation, for example using a variant of DynamicHasher that has a single 
`byte[]` to hash:
   
   ```java
   byte[] b = ...; // The SimpleBloomFilter converts T to byte[]
   HashFunction hf = ...; // From the SimpleBloomFilter
   Hasher h = new DynamicHasher(hf, b);
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to