Hi Andrzej,

Yes, it seems like a good option. However, it is GPL, and I noticed in one of the posts that this license is no good for apach.org :).

Regards,

Gal


Andrzej Bialecki wrote:
Gal Nitzan wrote:
Hi,

Well, the reason for this plugin is that i wish to crawl many sites but they all must be in my list. If it was implemented with regular expressions, the filter would still have to loop 100K expressions on each url for a match right?

No, that's the whole point - using the library I mentioned you can build a _single_ finite state automaton from all expressions. No looping, just traversing a tree (or whatever equivalent structure they use).

100k regexps is still alot, so I'm not totally sure it would be much faster, but perhaps worth checking.



Reply via email to