Claudenw commented on issue #83: WIP: Initial bloom filter code contribution
URL: 
https://github.com/apache/commons-collections/pull/83#issuecomment-545770968
 
 
   Ok, first I think there is a misunderstanding.  In most cases when
   constructing the Bloom filter there will be more than one hash per object
   inserted.  Using the bloom filter calculator
   https://hur.st/bloomfilter/?n=10000&p=50000&m=&k= for 10K items with a
   collision probability of 1 in 50K yields a Bloom filter with 225200 bits of
   which 16 are turned on for any one item.
   
   Thus you have to generate 16 hash functions for the object and look for all
   16 bits.  The absolute fastest way to do this i to turn  the 225,200 bits
   into 3519 longs and do the logical OR between the two Bloom filters.
   
   There are strategies to try to reduce the number of longs needed to do the
   comparison using various compression strategies like EWAH. (
   https://github.com/lemire/javaewah) but the operational steps are the same.
   
   Claude
   
   On Thu, Oct 24, 2019 at 3:15 AM SilverNarcissus <[email protected]>
   wrote:
   
   > Let's make the situation more clear. In my consideration, I have a bloom
   > filter (just call it b) which comprised of the merge of several thousand
   > (lets say 10,000) filters. And this is already done and I really care about
   > the query performance. The query is like this:
   > Given an object, check whether it in b.
   > In my opinion, there is no need to build a new bloom filter for every
   > object and compare it with b. We only need to check current object's hash
   > location in b is 1.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/commons-collections/pull/83?email_source=notifications&email_token=AASTVHXGAJDQXKKVH2J4223QQEAMDA5CNFSM4IWGTGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECDOLBA#issuecomment-545711492>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AASTVHQERTYYKN7MUJSPMVLQQEAMDANCNFSM4IWGTGAQ>
   > .
   >
   
   
   -- 
   I like: Like Like - The likeliest place on the web
   <http://like-like.xenei.com>
   LinkedIn: http://www.linkedin.com/in/claudewarren
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to