Claudenw commented on issue #83: WIP: Initial bloom filter code contribution URL: https://github.com/apache/commons-collections/pull/83#issuecomment-545860977 On Thu, Oct 24, 2019 at 11:32 AM Alex Herbert <[email protected]> wrote: > I think the point is: > > "either test or set k bits using positions chosen using hash functions. > ... > if any of the bits are not set, the element certainly does not exist." > > So you can do the comparison with the first hashing function, then the > next, etc, until you have processed all the hashing functions. At any point > if a match is not found you can stop thus allowing early exit of your > entire comparison. > > If the hashing function outputs a single bit position then I can see value > in partial computation of the entire combined hash. > > So this is a question of whether part of the entire hash can be computed > in a stage and used against the full bits of the Bloom filter. In your > example for a filter of length 225,000 bits built using 16 hashing > functions are any of these true: > > 1. Each hashing function identifies only 1 bit > > TRUE > > 1. Each hashing function identifies bits from a non-overlapping range > of the 225,000 bits > > FALSE. If I understand your statement correctly. The hash functions can return the same bit. In the worst case you would get the same bit for all the hash functions. In reality you probably get somewhere between 12 and 16 bits. > > > In the example below the hashing function identifies 2 bits. For target A > you only need to check the first bit to know that the object is not in the > filter. For target B you have to check both. > > Bloom filter: > -----------1--------1------11---------1---------------11---------------1--- > > Target A (not present): > --1-----------------------------------------------------------------------1 > > Target B (may be present): > -----------1--------------------------1------------------------------------ > > Yes you can perform the check as you indicate, but I maintain it will take longer to evaluate. Claude
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
