[ 
https://issues.apache.org/jira/browse/COLLECTIONS-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claude Warren updated COLLECTIONS-824:
--------------------------------------
    Description: 
The wikipedia article on [Double Hashing 
(variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that 
this type of hasher when used in Bloom filters will result in a collision rate 
that is twice as likely as intended.

An alternative is to provide enhanced double hashing which changes the 
increment each loop iteration. This would require some additional work on 
wrapping the modulus of the increment.

It may be better to change the name to KMHasher and more explicitly state that 
the hash functions are:


 

g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]
 

  was:
The wikipedia article on [Double Hashing 
(variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that 
this type of hasher when used in Bloom filters will result in a collision rate 
that is twice as likely as intended.

An alternative is to provide enhanced double hashing which changes the 
increment each loop iteration. This would require some additional work on 
wrapping the modulus of the increment.

It may be better to change the name to KMHasher and more explicitly state that 
the hash functions are:
 {{g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]}}
 


> BloomFilter: change name of SimpleHasher
> ----------------------------------------
>
>                 Key: COLLECTIONS-824
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-824
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Major
>              Labels: bloom-filter
>
> The wikipedia article on [Double Hashing 
> (variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that 
> this type of hasher when used in Bloom filters will result in a collision 
> rate that is twice as likely as intended.
> An alternative is to provide enhanced double hashing which changes the 
> increment each loop iteration. This would require some additional work on 
> wrapping the modulus of the increment.
> It may be better to change the name to KMHasher and more explicitly state 
> that the hash functions are:
>  
> g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to