[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-06-03 Thread Scott Cruzen
* Jim Snow [EMAIL PROTECTED] [080531 15:23]:
 Without looking at the code to verify that this is how it has really  
 been implemented, a bloom filter is like a series of hash tables, where  
 the hash table entries are one bit.  The bit is set if there is an item  
 that hashes to that value in the bloom filter.  So, assuming a single  
 hash table where half the bits are set, there's a 50% false positive  
 rate and no false negatives when you do a membership test.

 To reduce the false positives, we can add another hash table with a  
 different hash function.  Assuming it also is half full, we can check if  
 an item is in both tables, and our false positive rate drops to 25%.

 In practice, one might use something like 32 hash tables.  This yields a  
 false positive rate of 1/(2^32).  Their most obvious application is to  
 store the dictionary for a spell checker in a space-efficient way,  
 though I have a friend who wrote a paper on using them for router caches.

 There was a google tech talk on bloom filters awhile ago:  
 http://www.youtube.com/watch?v=947gWqwkhu0

One other use:

LOAF is a simple extension to email that lets you append your entire
address book to outgoing mail message without compromising your privacy.
Correspondents can use this information to prioritize their mail, and
learn more about their social networks. The LOAF home page is at
http://loaf.cantbedone.org.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-06-02 Thread Edward Kmett
On Sat, May 31, 2008 at 6:22 PM, Jim Snow [EMAIL PROTECTED] wrote:
 In practice, one might use something like 32 hash tables.  This yields a
 false positive rate of 1/(2^32).  Their most obvious application is to store
 the dictionary for a spell checker in a space-efficient way, though I have a
 friend who wrote a paper on using them for router caches.

One minor technicality is that you don't actually use k separate hash
tables. You use k separate hash functions, and hash using different
functions into the same physical table with a goal of having
approximately half of
the bits in the table set when all of your data is hashed.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread ajb

G'day all.

Quoting Achim Schneider [EMAIL PROTECTED]:


Please tell me that this isn't reversible.


It isn't reversible.

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread Aaron Denney
On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote:
 Bryan O'Sullivan [EMAIL PROTECTED] wrote:

 A Bloom filter is a probabilistic data
 structure that provides a fast set membership querying capability.
 It does not give false negatives, but has a tunable false positive
 rate.  (A false positive arises when the filter claims that an
 element is present, but in fact it is not.)
 
 /me squints.

 Please tell me that this isn't reversible.

Tell me what you mean by reversible.  You can't, for instance,
extract the items in the set.

-- 
Aaron Denney
--

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread Achim Schneider
Aaron Denney [EMAIL PROTECTED] wrote:

 On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote:
  Bryan O'Sullivan [EMAIL PROTECTED] wrote:
 
  A Bloom filter is a probabilistic data
  structure that provides a fast set membership querying capability.
  It does not give false negatives, but has a tunable false positive
  rate.  (A false positive arises when the filter claims that an
  element is present, but in fact it is not.)
  
  /me squints.
 
  Please tell me that this isn't reversible.
 
 Tell me what you mean by reversible.  You can't, for instance,
 extract the items in the set.
 
I guess invertible would have been the right word, though it's still
ambiguous.

Turning it into something that does not give false positives, but has a
tunable false negative rate.

Without looking at the algorithm, I imagine it working somewhat like a
hashtable, and this inversion would utterly destroy my intuition.

-- 
(c) this sig last receiving data processing entity. Inspect headers for
past copyright information. All rights reserved. Unauthorised copying,
hiring, renting, public performance and/or broadcasting of this
signature prohibited. 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread Jim Snow

Achim Schneider wrote:

Aaron Denney [EMAIL PROTECTED] wrote:

  

On 2008-05-30, Achim Schneider [EMAIL PROTECTED] wrote:


Bryan O'Sullivan [EMAIL PROTECTED] wrote:

  

A Bloom filter is a probabilistic data
structure that provides a fast set membership querying capability.
It does not give false negatives, but has a tunable false positive
rate.  (A false positive arises when the filter claims that an
element is present, but in fact it is not.)



/me squints.

Please tell me that this isn't reversible.
  

Tell me what you mean by reversible.  You can't, for instance,
extract the items in the set.



I guess invertible would have been the right word, though it's still
ambiguous.

Turning it into something that does not give false positives, but has a
tunable false negative rate.

Without looking at the algorithm, I imagine it working somewhat like a
hashtable, and this inversion would utterly destroy my intuition.
  
Without looking at the code to verify that this is how it has really 
been implemented, a bloom filter is like a series of hash tables, where 
the hash table entries are one bit.  The bit is set if there is an item 
that hashes to that value in the bloom filter.  So, assuming a single 
hash table where half the bits are set, there's a 50% false positive 
rate and no false negatives when you do a membership test.


To reduce the false positives, we can add another hash table with a 
different hash function.  Assuming it also is half full, we can check if 
an item is in both tables, and our false positive rate drops to 25%.


In practice, one might use something like 32 hash tables.  This yields a 
false positive rate of 1/(2^32).  Their most obvious application is to 
store the dictionary for a spell checker in a space-efficient way, 
though I have a friend who wrote a paper on using them for router caches.


There was a google tech talk on bloom filters awhile ago: 
http://www.youtube.com/watch?v=947gWqwkhu0


-jim
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-30 Thread Achim Schneider
Bryan O'Sullivan [EMAIL PROTECTED] wrote:

 A Bloom filter is a probabilistic data
 structure that provides a fast set membership querying capability.
 It does not give false negatives, but has a tunable false positive
 rate.  (A false positive arises when the filter claims that an
 element is present, but in fact it is not.)
 
/me squints.

Please tell me that this isn't reversible.

-- 
(c) this sig last receiving data processing entity. Inspect headers for
past copyright information. All rights reserved. Unauthorised copying,
hiring, renting, public performance and/or broadcasting of this
signature prohibited. 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe