Claudenw commented on issue #83: WIP: Initial bloom filter code contribution URL: https://github.com/apache/commons-collections/pull/83#issuecomment-545770968 Ok, first I think there is a misunderstanding. In most cases when constructing the Bloom filter there will be more than one hash per object inserted. Using the bloom filter calculator https://hur.st/bloomfilter/?n=10000&p=50000&m=&k= for 10K items with a collision probability of 1 in 50K yields a Bloom filter with 225200 bits of which 16 are turned on for any one item. Thus you have to generate 16 hash functions for the object and look for all 16 bits. The absolute fastest way to do this i to turn the 225,200 bits into 3519 longs and do the logical OR between the two Bloom filters. There are strategies to try to reduce the number of longs needed to do the comparison using various compression strategies like EWAH. ( https://github.com/lemire/javaewah) but the operational steps are the same. Claude On Thu, Oct 24, 2019 at 3:15 AM SilverNarcissus <[email protected]> wrote: > Let's make the situation more clear. In my consideration, I have a bloom > filter (just call it b) which comprised of the merge of several thousand > (lets say 10,000) filters. And this is already done and I really care about > the query performance. The query is like this: > Given an object, check whether it in b. > In my opinion, there is no need to build a new bloom filter for every > object and compare it with b. We only need to check current object's hash > location in b is 1. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/commons-collections/pull/83?email_source=notifications&email_token=AASTVHXGAJDQXKKVH2J4223QQEAMDA5CNFSM4IWGTGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECDOLBA#issuecomment-545711492>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AASTVHQERTYYKN7MUJSPMVLQQEAMDANCNFSM4IWGTGAQ> > . > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
