> J�r�me: Once can provide If-Modified-Since information in GET requests,
> too.  I think that's preferable to HEAD, because with HEAD requests one
> would have to first perform a HEAD request, and then another GET for
> changed pages.  With the conditional GET request a single request is
> all that's needed, as long as the If-Modified-Since request header is
> provided.
Yes, it's true. But the HEAD request could be useful if you want to
perform some filtering on HTTP headers. For instance, if you don't
want to download some resources for some content-types, you can
perform a HEAD request and cancel the operation if the content-type of
the HEAD response is not a content-type you want to index.
Moreover, if the code keeps the same connection to perform the two
requests (HEAD and GET), it will not really decrease performances.
A more complex support of the HEAD method could be to use it for
resources that are not modified frequently, and to uses a GET
(If-Modified-Since) request for resources that are frequently modified
(it implies that Nutch must keep an history of modifications!)

Once I finish to implement Mime-Magic support, I will perform some
tests of the HEAD method in the Http Plugin.

J�r�me

Reply via email to