Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
Thomas Hartman [EMAIL PROTECTED] writes: What kind of speed do you get on your laptop for Data.Set? How much faster is the bloom filter? I tried to modify examples/Words.hs to use Data.Set insted. The results look like this (first Bloom, second Data.Set, both compiled with -O2): nmd:..filter/examples % ./Words 57025 words 0.013326ss to count words Bloom { 4194304 bits } 0.050608ss to construct filter 0.034806ss to query every element nmd:..filter/examples % ./WordsS 57025 words 0.013291ss to count words False 0.755115ss to construct filter 0.423289ss to query every element In order to avoid printing the entire set, while still evaluating it, I replaced the printing of the set with printing the result of a search for a non-existing element - I should really use a strict insert, I guess. Anyway, this hopefully gives an indication - looks like a factor of 10 in this case, but it will depend on the size of the data - more data, greater improvement. BTW, Nice work, Bryan! I have plans for this. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
On Fri, May 30, 2008 at 11:30 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote: I'm pleased to announce the availability of a fast Bloom filter library for Haskell. A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) The library is easy to use. As an example, here's a reimplementation of the Unix spell command. import Data.BloomFilter.Easy (easyList, elemB) main = do filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt let check word | word `elemB` filt = | otherwise = word ++ \n interact (concat . map check . lines) It is also carefully tuned for performance. On my laptop, I can sustain a construction or query rate well in excess of a million ByteStrings per second. Source code: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter Latest bits: darcs get http://darcs.serpentine.com/bloomfilter The Hashable stuff in there looks like it might be independently useful. Any interest in splitting it out into an independent package or is it really intended to be something fairly specific to the Bloom filter implementation? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
David MacIver wrote: The Hashable stuff in there looks like it might be independently useful. Probably, yes. Any interest in splitting it out into an independent package or is it really intended to be something fairly specific to the Bloom filter implementation? I'll split them if there's a request, but time constraints must alas prevail in the face of mere polite interest :-) b ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
On Sat, May 31, 2008 at 4:09 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote: David MacIver wrote: The Hashable stuff in there looks like it might be independently useful. Probably, yes. Any interest in splitting it out into an independent package or is it really intended to be something fairly specific to the Bloom filter implementation? I'll split them if there's a request, but time constraints must alas prevail in the face of mere polite interest :-) Seems fair enough to me. I was indeed just curious. :-) I'll shout if I have an actual use case (or possibly just include the whole thing. It's not exactly a major size burden) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
What kind of speed do you get on your laptop for Data.Set? How much faster is the bloom filter? thomas. 2008/5/30 Bryan O'Sullivan [EMAIL PROTECTED]: I'm pleased to announce the availability of a fast Bloom filter library for Haskell. A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) The library is easy to use. As an example, here's a reimplementation of the Unix spell command. import Data.BloomFilter.Easy (easyList, elemB) main = do filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt let check word | word `elemB` filt = | otherwise = word ++ \n interact (concat . map check . lines) It is also carefully tuned for performance. On my laptop, I can sustain a construction or query rate well in excess of a million ByteStrings per second. Source code: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter Latest bits: darcs get http://darcs.serpentine.com/bloomfilter b ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [ANN] bloomfilter 1.0 - Fast immutable and mutable Bloom filters
I'm pleased to announce the availability of a fast Bloom filter library for Haskell. A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.) The library is easy to use. As an example, here's a reimplementation of the Unix spell command. import Data.BloomFilter.Easy (easyList, elemB) main = do filt - (easyList 0.01 . words) `fmap` readFile dictionary.txt let check word | word `elemB` filt = | otherwise = word ++ \n interact (concat . map check . lines) It is also carefully tuned for performance. On my laptop, I can sustain a construction or query rate well in excess of a million ByteStrings per second. Source code: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter Latest bits: darcs get http://darcs.serpentine.com/bloomfilter b ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe