I've been watching discussion of faster regex libs with much interest. But if regex speed seems to be a problem, would using less regexes be a good answer?

Protocol and extension filtering could be done by another URLFilter plugin that is dedicated to this task, and uses more lightweight string-chopping techniques. That way full regex support could be retained for the tasks where it's really needed.


On Mar 13, 2006, at 12:31 PM, Howie Wang wrote:


I have made some quick tests with regex-urlfilter...
The major problem is that it doen't use the  Perl syntax...
For instance, ît doesn't support the boundary matchers ^ and $ (which are
used in nutch)

Are there other ways to match start/end of string in the other
regex library? I use "^http" a lot because a lot of sites pass around
urls in the query string, and I don't want them (eg.
http://del.icio.us/howie?url=http://lucene.apache.org/nutch)

Howie

--
Matt Kangas / [EMAIL PROTECTED]


Reply via email to