> J�r�me: Once can provide If-Modified-Since information in GET requests, > too. I think that's preferable to HEAD, because with HEAD requests one > would have to first perform a HEAD request, and then another GET for > changed pages. With the conditional GET request a single request is > all that's needed, as long as the If-Modified-Since request header is > provided. Yes, it's true. But the HEAD request could be useful if you want to perform some filtering on HTTP headers. For instance, if you don't want to download some resources for some content-types, you can perform a HEAD request and cancel the operation if the content-type of the HEAD response is not a content-type you want to index. Moreover, if the code keeps the same connection to perform the two requests (HEAD and GET), it will not really decrease performances. A more complex support of the HEAD method could be to use it for resources that are not modified frequently, and to uses a GET (If-Modified-Since) request for resources that are frequently modified (it implies that Nutch must keep an history of modifications!)
Once I finish to implement Mime-Magic support, I will perform some tests of the HEAD method in the Http Plugin. J�r�me
