aherbert commented on a change in pull request #160:
URL:
https://github.com/apache/commons-collections/pull/160#discussion_r426131444
##########
File path:
src/test/java/org/apache/commons/collections4/bloomfilter/hasher/HasherBuilderTest.java
##########
@@ -29,7 +32,8 @@
/**
* Tests the
- * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
Hasher.Builder}.
+ * {@link org.apache.commons.collections4.bloomfilter.hasher.Hasher.Builder
+ * Hasher.Builder}.
Review comment:
I think I was not clear. The `byte[]` is just the part of the item T
that you want to hash. So you have two situations where you convert the item T
to either:
1. T -> List<byte[]>
2. T -> byte[]
These are then fed into the DynamicHasher. In case 1 you have a byte[] for
each property or part of T you want to add to the Bloom filter. In case 2 you
are adding the entire item T as a single entity.
You could group 1 and 2 into the same situation where I concatenate the
parts that would make up the List<byte[]> to a single longer array and create a
SimpleBuilder containing the result. The idea is the same: a single item T
should only add one set of indices to the Bloom filter.
I do not suggest by-passing the Hasher. I suggest making it easier to
specify case 2. Thus as a user I can convert my item to a binary form `byte[]`
then the SimpleBloomFilter would do the rest of the work to hash the binary
form and create a sequence of indexes for the filter shape.
This is why I suggested a `Function<T, byte[]>` version. The `byte[]` would
be used internally by the SimpleBloomFilter to create a Hasher and generate
indices for the Shape. Essentially this would be:
```java
byte[] b = ...;
Hasher h = new SimpleBuilder().with(b).build();
```
But this could be done using a specialised route that avoids all the object
creation, for example using a variant of DynamicHasher that has a single
`byte[]` to hash:
```java
byte[] b = ...; // The SimpleBloomFilter converts T to byte[]
HashFunction hf = ...; // From the SimpleBloomFilter
Hasher h = new DynamicHasher(hf, b);
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]