Claudenw commented on issue #83: WIP: Initial bloom filter code contribution URL: https://github.com/apache/commons-collections/pull/83#issuecomment-543048050 If you were not to build a bloom filter what would you do to check if something was in the filter? I am going to guess at an answer here. 1) create a bit array 2) do some hashing of the object and turn on bits in the array 3) execute [ array & filter == array ] test to see if it was in there. In essence you build a bloom filter. These same steps are done when you create a bloom filter instance in the class So yes you have to build a bloom filter but the overhead is virtually non-existent as using the builder pattern removes/replaces steps from your process: Assumeing partA partB and partC of the object are to be hashed, the steps using the library are: // assuming 10K entries in the filter and 1 in 50K collision rate. This only needs to be build once. BloomFilterConfiguration config = new BloomFIlterConfiguration( 10000, 1.0/50000 );/ BloomFilter bf = new StandardBloomFilter( ProtoBloomFilter.builder().with( obj.partA ).with( obj.partB ).with( obj.partC).build(), config ); So the over head for comparisson is minimal. The filters are immutable because if you pass them as parameters you want to ensure that they are not modified by external operations (either intentionally or through error). In most use cases once the filter is constructed it is used in one or more comparisons. The only time a filter is modified is when another filter is added to the collection. In my experience the rate at which you add filters to collections is much lower than the rate at which you compare them. The other reason is that the equality and the hashCode of a filter are based on the bits that are on. So leaving them mutable would mean that the filters could not be used in a Hash based Set, because the hash would change after insertion when a new object was added. Claude Claude On Thu, Oct 17, 2019 at 2:20 AM SilverNarcissus <[email protected]> wrote: > Hi~ Thanks for your work! I have a question about you use immutable bloom > filter and if we want to judge whether an object is in a bloom filter, we > have to build a new bloom filter and compare them. There may be a > performance impact. How come you choose this design? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/commons-collections/pull/83?email_source=notifications&email_token=AASTVHUGX56CGN73LMHN2IDQO64WXA5CNFSM4IWGTGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBONO2A#issuecomment-542955368>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AASTVHSGNND3VDZLN72RDBLQO64WXANCNFSM4IWGTGAQ> > . > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
