Hi,
Well, the reason for this plugin is that i wish to crawl many sites but
they all must be in my list. If it was implemented with regular
expressions, the filter would still have to loop 100K expressions on
each url for a match right?
Regards,
Gal
Andrzej Bialecki wrote:
[EMAIL PROTECTED] wrote:
Hi Gal,
I'm curious about the memory consumption of the cache and the speed of
retrieval of an item from the cache, when the cache has 100k domains in
it.
Slightly off-topic, but I hope this is relevant to the original reason
for creating this plugin...
There is a BSD-licensed library that implements a large subset of
regexps, which is based on finite automata. It is reported to be
scalable and very fast (benchmarks are surely impressive):
http://www.brics.dk/~amoeller/automaton/
I suggest to do some tests with 100k regexps and see if it survives.