Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-06-03 Thread Ketil Malde
Thomas Hartman [EMAIL PROTECTED] writes:

 What kind of speed do you get on your laptop for Data.Set? How much
 faster is the bloom filter?

I tried to modify examples/Words.hs to use Data.Set insted.  The
results look like this (first Bloom, second Data.Set, both compiled
with -O2):

nmd:..filter/examples % ./Words   
57025 words
0.013326ss to count words
Bloom { 4194304 bits } 
0.050608ss to construct filter
0.034806ss to query every element

nmd:..filter/examples % ./WordsS
57025 words
0.013291ss to count words
False
0.755115ss to construct filter
0.423289ss to query every element

In order to avoid printing the entire set, while still evaluating it,
I replaced the printing of the set with printing the result of a
search for a non-existing element - I should really use a strict
insert, I guess.  Anyway, this hopefully gives an indication - looks
like a factor of 10 in this case, but it will depend on the size of
the data - more data, greater improvement.

BTW, Nice work, Bryan!  I have plans for this.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread David MacIver
On Fri, May 30, 2008 at 11:30 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote:
 I'm pleased to announce the availability of a fast Bloom filter library
 for Haskell.  A Bloom filter is a probabilistic data structure that
 provides a fast set membership querying capability.  It does not give
 false negatives, but has a tunable false positive rate.  (A false
 positive arises when the filter claims that an element is present, but
 in fact it is not.)

 The library is easy to use.  As an example, here's a reimplementation of
 the Unix spell command.

 import Data.BloomFilter.Easy (easyList, elemB)

 main = do
  filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt
  let check word | word `elemB` filt = 
 | otherwise = word ++ \n
  interact (concat . map check . lines)

 It is also carefully tuned for performance.  On my laptop, I can sustain
 a construction or query rate well in excess of a million ByteStrings per
 second.

 Source code:

 http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter

 Latest bits:

 darcs get http://darcs.serpentine.com/bloomfilter

The Hashable stuff in there looks like it might be independently
useful. Any interest in splitting it out into an independent package
or is it really intended to be something fairly specific to the Bloom
filter implementation?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread Bryan O'Sullivan
David MacIver wrote:

 The Hashable stuff in there looks like it might be independently
 useful.

Probably, yes.

 Any interest in splitting it out into an independent package
 or is it really intended to be something fairly specific to the Bloom
 filter implementation?

I'll split them if there's a request, but time constraints must alas
prevail in the face of mere polite interest :-)

b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread David MacIver
On Sat, May 31, 2008 at 4:09 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote:
 David MacIver wrote:

 The Hashable stuff in there looks like it might be independently
 useful.

 Probably, yes.

 Any interest in splitting it out into an independent package
 or is it really intended to be something fairly specific to the Bloom
 filter implementation?

 I'll split them if there's a request, but time constraints must alas
 prevail in the face of mere polite interest :-)

Seems fair enough to me. I was indeed just curious. :-) I'll shout if
I have an actual use case (or possibly just include the whole thing.
It's not exactly a major size burden)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-31 Thread Thomas Hartman
What kind of speed do you get on your laptop for Data.Set? How much
faster is the bloom filter?

thomas.

2008/5/30 Bryan O'Sullivan [EMAIL PROTECTED]:
 I'm pleased to announce the availability of a fast Bloom filter library
 for Haskell.  A Bloom filter is a probabilistic data structure that
 provides a fast set membership querying capability.  It does not give
 false negatives, but has a tunable false positive rate.  (A false
 positive arises when the filter claims that an element is present, but
 in fact it is not.)

 The library is easy to use.  As an example, here's a reimplementation of
 the Unix spell command.

 import Data.BloomFilter.Easy (easyList, elemB)

 main = do
  filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt
  let check word | word `elemB` filt = 
 | otherwise = word ++ \n
  interact (concat . map check . lines)

 It is also carefully tuned for performance.  On my laptop, I can sustain
 a construction or query rate well in excess of a million ByteStrings per
 second.

 Source code:

 http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter

 Latest bits:

 darcs get http://darcs.serpentine.com/bloomfilter

b
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters

2008-05-30 Thread Bryan O'Sullivan
I'm pleased to announce the availability of a fast Bloom filter library
for Haskell.  A Bloom filter is a probabilistic data structure that
provides a fast set membership querying capability.  It does not give
false negatives, but has a tunable false positive rate.  (A false
positive arises when the filter claims that an element is present, but
in fact it is not.)

The library is easy to use.  As an example, here's a reimplementation of
the Unix spell command.

import Data.BloomFilter.Easy (easyList, elemB)

main = do
  filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt
  let check word | word `elemB` filt = 
 | otherwise = word ++ \n
  interact (concat . map check . lines)

It is also carefully tuned for performance.  On my laptop, I can sustain
a construction or query rate well in excess of a million ByteStrings per
second.

Source code:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter

Latest bits:

darcs get http://darcs.serpentine.com/bloomfilter

b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe