[
https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990527#comment-16990527
]
Claude Warren commented on COLLECTIONS-728:
-------------------------------------------
*API Issue*
The Hash function takes a buffer and a seed and returns an integer. In general
this is a simple operation. You might use MD5, SHA256, Murmur128_x64 or
Murmur128_x86. So the name must specify what hash is used -- I think we can
agree on this.
The second the calculation may be iterative or cyclic. Iterative is probably
the one you are familiar with where the hashing function is called with a
different seed each time and that result used. cyclic is when two values are
created: call them partA and partB. The first time the function is called
partA is returned. for all subsequent calls partB is added to the previous
results and returned. This method is documented in the Cassandra codebase and
their experimentation has shown that it does not impact the randomness of the
subsequent Bloom filter but does significantly speed up processing. Generally
this method is used with a 128 bit hash as the result can be considers as 2
longs.
Finally, the result of the function will differ depending on whether the
numeric values were calculated using unsigned or signed arithmetic.
This information needs to be passed down to components that do not keep a
reference to the hasher or may not even have the hasher implementation
available. When comparing 2 filters you have to know if the "shape" is the
same. but the "shape" does not maintain a reference to he hash function.
Classes like the static hasher which maintains a list of the bits that were
enabled needs to know the shape but does not have a reference to the hash
function.
These classes need to be able to verify that the function that generated the
values was the same.
*Implementation Issues*
Are you saying that all classes in Collections have to be immutable? Or that
the classes noted are not mutable? They are mutable and we had a discussion
about this wherein the standard uses of Bloom filters requires that they be
mutable or significant overhead is imposed to create new Bloom filters whenever
filters are merged (a common operation).
> BloomFilter contribution
> ------------------------
>
> Key: COLLECTIONS-728
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-728
> Project: Commons Collections
> Issue Type: Task
> Reporter: Claude Warren
> Priority: Minor
> Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java,
> Usage.md
>
>
> Contribution of BloomFilter library comprising base implementation and gated
> collections.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)