Re: Urlfilter Patch

Doug Cutting Thu, 01 Dec 2005 13:40:32 -0800

Matt Kangas wrote:

The latter is not strictly true. Nutch could issue an HTTP HEAD beforethe HTTP GET, and determine the mime-type before actually grabbing thecontent.
It's not how Nutch works now, but this might be more useful than asuper-detailed set of regexes...

This could be a useful addition, but it could not replace url-basedfilters. A HEAD request must still be polite, so this couldsubstantially slow fetching, as it would incur more delays. Also, formost dynamic pages, a HEAD is as expensive for the server as a GET, sothis would cause more load on servers.


Doug

Re: Urlfilter Patch

Reply via email to