I implemented a Bloom filters vocab: git://github.com/alec/factor.git in the bloom-filters branch
It's still a bit rough around the edges, but it's usable and has both tests and documentation. Any feedback is appreciated; if it looks useful, please pull it into Factor. On a 1.4GHz 32-bit Pentium M, I can create a filter from the ~100k words in /usr/share/dict/words in about a second and look them all back up in about the same amount of time. The false positive rate is ~10x what the math predicts it should be; there are some notes in the code about how that could be improved. I have a question on error handling. If my math is right, max-array-capacity on linux-x86-32 means that the largest bit-array I can create is about 16MB. That's a lot of bits, but not that many. What's the best way to signal to the user, "I can't create something that big?" I see that some arrays will signal from the VM, but that doesn't look particularly accessible for my code. The other behavior I saw was from the bit-arrays vocab, which will effectively mod the number of bits requested by max-array-capacity and return a surprisingly-sized array. SBCL will yell at you if you try to store a non-fixnum into a fixnum slot; I would find that behavior useful from Factor. ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Factor-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/factor-talk
