I know this is off on a tangent, but: One huge adavantage to filtering in the FetchListTool (or is that the Generator, I'm still on 0.7?) is that you can generate separate fetch lists for separate "scopes", or subsets of your crawl data. You can then give your users some control over which of several scopes they're actually searching in; all while having a single URL database. I suspect many people who are using Nutch over one or a small number of sites are actually doing this. Regards, David.
Date: Wed, 08 Mar 2006 10:42:50 -0800 From: Doug Cutting <[EMAIL PROTECTED]> To: [email protected] Subject: [Nutch-dev] Re: svn commit: r384219 - /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Reply-To: [EMAIL PROTECTED] Andrzej Bialecki wrote: > IMHO doing this here has a minimal impact while preventing a common > problem, but if you think this would harm many users then we should of > course make it optional. Let's just leave it as-is for now. Thanks! Doug ******************************************************************************** This email may contain legally privileged information and is intended only for the addressee. It is not necessarily the official view or communication of the New Zealand Qualifications Authority. If you are not the intended recipient you must not use, disclose, copy or distribute this email or information in it. If you have received this email in error, please contact the sender immediately. NZQA does not accept any liability for changes made to this email or attachments after sending by NZQA. All emails have been scanned for viruses and content by MailMarshal. NZQA reserves the right to monitor all email communications through its network. ********************************************************************************
