aherbert edited a comment on issue #83: WIP: Initial bloom filter code contribution URL: https://github.com/apache/commons-collections/pull/83#issuecomment-545903414 If the 225,000 bits of a bloom filter are 3519 longs then you have 3519 ((filter & target) == target) operations with possible early exit: ``` long[] filter = ... long[] target = ... for (int i = 0; i < filter.length; i++) { if ((filter[i] & target[i]) != target[i]) { return NO_MATCH; } } return MATCH; ``` If the expected number of bits turned on is 16 this seems wasteful, but I've not worked through the probability distribution of bits and the expected number of loop executions. The key point is that we can skip all the zeros in the target. Since this: ``` if ((filter[i] & target[i]) != target[i]) ``` is always false when the target is zero. It is only when the target is not zero that a result is of interest. If you know that you only have 16 bits to compare you just isolate those: ``` // Dynamically computed bits of the hash function IntStream bits = ... boolean noMatch = bits.filter(bit -> { long value = filter[bit / 64]; // Check the bit is not set return (value & (1L << (bit & 63))) == 0; }).findAny().isPresent(); ``` I've used the streams API here but you could just compute each hash bit in a loop: ``` for (int i = 0; i < 16; i++) { // Dynamically computed bits of the hash function int bit = computeHash(i); long value = filter[bit / 64]; // Check the bit is not set if (value & (1L << (bit & 63))) == 0) { return NO_MATCH; } } return MATCH; ``` There are more operations per loop with the shifts used to isolate the correct bit but only 16 possible loops.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
