Greetings, I'm new to using Nutch and I'm just jumping right in to develop some filters. However, I first have a couple questions that I would really appreciate answers to. Please keep in mind that I have read the documentation already and reviewed a good portion of the code.
First, are filters loaded and re-used or should I expect a new instance to be created per URL? There is a very intense loading process that I'm going to implement and depending on the behavior, I'll either have it cached statically or just not worry about it at all if the filter is only instantiated once anyway. Secondly and with relation to my first question, what sort of performance does the URL matching provide? Are compiled patterns cached? I have upward of 3,374,121 regex expressions for URL and domain filtering that will provide an exceedingly fine granularity of acceptance. Basically, if the default URL RegEx filter already provides a cache for loading RegEx and a cache for that compiled patterns then I should be able to instead write a script which dumps the RegEx from the database into the configuration file. Otherwise, I'll implement my own to load them directly from the database. Thanks in advance, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/RegEx-Domain-URL-matching-tp2997679p2997679.html Sent from the Nutch - Dev mailing list archive at Nabble.com.

