[Factor-talk] new bloom-filters vocab

Alec Berryman Thu, 07 May 2009 20:22:19 -0700

I implemented a Bloom filters vocab:

  git://github.com/alec/factor.git in the bloom-filters branch


It's still a bit rough around the edges, but it's usable and has both
tests and documentation.  Any feedback is appreciated; if it looks
useful, please pull it into Factor.

On a 1.4GHz 32-bit Pentium M, I can create a filter from the ~100k words
in /usr/share/dict/words in about a second and look them all back up in
about the same amount of time.  The false positive rate is ~10x what the
math predicts it should be; there are some notes in the code about how
that could be improved.


I have a question on error handling.  If my math is right,
max-array-capacity on linux-x86-32 means that the largest bit-array I
can create is about 16MB.  That's a lot of bits, but not that many.
What's the best way to signal to the user, "I can't create something
that big?"

I see that some arrays will signal from the VM, but that doesn't look
particularly accessible for my code.  The other behavior I saw was from
the bit-arrays vocab, which will effectively mod the number of bits
requested by max-array-capacity and return a surprisingly-sized array.
SBCL will yell at you if you try to store a non-fixnum into a fixnum
slot; I would find that behavior useful from Factor.


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk

[Factor-talk] new bloom-filters vocab

Reply via email to