[
https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990798#comment-16990798
]
Claude Warren commented on COLLECTIONS-728:
-------------------------------------------
Hasher names:
The Shape of the Bloom filter is dependent several things. One is that the
hashing function is consistent; that is it uses the same techniques to generate
the hashed values. Two different implementations of a hash function are are
the "same" as long as they generate the same values for the same input.
(referred to below as the "same function").
Comparing bloom filters that do not use the same function does not make sense.
Thus a mechanism to distinguish between filters with different hashing is
desired so that the user may be warned when such an attempt is made. This is
where the naming arises. The naming is intended to provide a proxy for the
implementation details as well as provide the user with some idea of what
hashes were used in order to assist in the resolution of the conflict.
You will note that the Shape class equality check verifies that the name is the
same for the same reason.
Similar to the Java Cryptography Architecture (JCA) two providers may provide
implementations of the same hash function. Users assume that the JCA
implementation has been vetted and is correctly implemented. In the Bloom
filter case an improperly implemented hash is only a serious issue in cross
application communication.
Note that the code in this contribution does not require the name format be
followed, only that different implementations be named differently. I do have
a Caching Hasher implementation in another application that requires a cyclic
hash function and will fail if the hash name does not match that presented here.
Perhaps it makes more sense to have a name an the booleans for cyclinc/iterated
and signed/unsigned. But I don't see a way to provide users with the ability
to use new, old or broken hash functions and be able to evaluate if the same
function is being used without using a name.
In my mind Enums are appropriate in two basic conditions: 1) you know all the
possible values; or 2) you want to tightly control the acceptable values.
Neither of these conditions apply in the case of hash function identification.
addendum:
In reading back over the previous post I was struck the the use of the term
"library". Perhaps this is just a "nomenclature" thing or perhaps it is a
"conceptual" thing. I see the Bloom filter contribution as a "framework", a
scaffolding on which other developers may hang new implementations and tweeks.
In my mind a "library" is generally something that is used without modification
or extension. I just wanted to ensure that we have the same "vision" of this
contribution.
> BloomFilter contribution
> ------------------------
>
> Key: COLLECTIONS-728
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-728
> Project: Commons Collections
> Issue Type: Task
> Reporter: Claude Warren
> Priority: Minor
> Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java,
> Usage.md
>
>
> Contribution of BloomFilter library comprising base implementation and gated
> collections.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)