Hi,

Well, the reason for this plugin is that i wish to crawl many sites but they all must be in my list. If it was implemented with regular expressions, the filter would still have to loop 100K expressions on each url for a match right?

Regards,

Gal

Andrzej Bialecki wrote:
[EMAIL PROTECTED] wrote:
Hi Gal,

I'm curious about the memory consumption of the cache and the speed of
retrieval of an item from the cache, when the cache has 100k domains in
it.

Slightly off-topic, but I hope this is relevant to the original reason for creating this plugin...

There is a BSD-licensed library that implements a large subset of regexps, which is based on finite automata. It is reported to be scalable and very fast (benchmarks are surely impressive):

    http://www.brics.dk/~amoeller/automaton/

I suggest to do some tests with 100k regexps and see if it survives.




Reply via email to