[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
* Jim Snow [EMAIL PROTECTED] [080531 15:23]: Without looking at the code to verify that this is how it has really been implemented, a bloom filter is like a series of hash tables, where the hash table entries are one bit. The bit is set if there is an item that hashes to that value in the bloom filter. So, assuming a single hash table where half the bits are set, there's a 50% false positive rate and no false negatives when you do a membership test. To reduce the false positives, we can add another hash table with a different hash function. Assuming it also is half full, we can check if an item is in both tables, and our false positive rate drops to 25%. In practice, one might use something like 32 hash tables. This yields a false positive rate of 1/(2^32). Their most obvious application is to store the dictionary for a spell checker in a space-efficient way, though I have a friend who wrote a paper on using them for router caches. There was a google tech talk on bloom filters awhile ago: http://www.youtube.com/watch?v=947gWqwkhu0 One other use: LOAF is a simple extension to email that lets you append your entire address book to outgoing mail message without compromising your privacy. Correspondents can use this information to prioritize their mail, and learn more about their social networks. The LOAF home page is at http://loaf.cantbedone.org. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
On Sat, May 31, 2008 at 6:22 PM, Jim Snow [EMAIL PROTECTED] wrote: In practice, one might use something like 32 hash tables. This yields a false positive rate of 1/(2^32). Their most obvious application is to store the dictionary for a spell checker in a space-efficient way, though I have a friend who wrote a paper on using them for router caches. One minor technicality is that you don't actually use k separate hash tables. You use k separate hash functions, and hash using different functions into the same physical table with a goal of having approximately half of the bits in the table set when all of your data is hashed. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
G'day all. Quoting Achim Schneider [EMAIL PROTECTED]: Please tell me that this isn't reversible. It isn't reversible. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote: Bryan O'Sullivan [EMAIL PROTECTED] wrote: A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) /me squints. Please tell me that this isn't reversible. Tell me what you mean by reversible. You can't, for instance, extract the items in the set. -- Aaron Denney -- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
Aaron Denney [EMAIL PROTECTED] wrote: On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote: Bryan O'Sullivan [EMAIL PROTECTED] wrote: A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) /me squints. Please tell me that this isn't reversible. Tell me what you mean by reversible. You can't, for instance, extract the items in the set. I guess invertible would have been the right word, though it's still ambiguous. Turning it into something that does not give false positives, but has a tunable false negative rate. Without looking at the algorithm, I imagine it working somewhat like a hashtable, and this inversion would utterly destroy my intuition. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
Achim Schneider wrote: Aaron Denney [EMAIL PROTECTED] wrote: On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote: Bryan O'Sullivan [EMAIL PROTECTED] wrote: A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) /me squints. Please tell me that this isn't reversible. Tell me what you mean by reversible. You can't, for instance, extract the items in the set. I guess invertible would have been the right word, though it's still ambiguous. Turning it into something that does not give false positives, but has a tunable false negative rate. Without looking at the algorithm, I imagine it working somewhat like a hashtable, and this inversion would utterly destroy my intuition. Without looking at the code to verify that this is how it has really been implemented, a bloom filter is like a series of hash tables, where the hash table entries are one bit. The bit is set if there is an item that hashes to that value in the bloom filter. So, assuming a single hash table where half the bits are set, there's a 50% false positive rate and no false negatives when you do a membership test. To reduce the false positives, we can add another hash table with a different hash function. Assuming it also is half full, we can check if an item is in both tables, and our false positive rate drops to 25%. In practice, one might use something like 32 hash tables. This yields a false positive rate of 1/(2^32). Their most obvious application is to store the dictionary for a spell checker in a space-efficient way, though I have a friend who wrote a paper on using them for router caches. There was a google tech talk on bloom filters awhile ago: http://www.youtube.com/watch?v=947gWqwkhu0 -jim ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
Bryan O'Sullivan [EMAIL PROTECTED] wrote: A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) /me squints. Please tell me that this isn't reversible. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe