[ 
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529522
 ] 

Andrzej Bialecki  commented on NUTCH-557:
-----------------------------------------

I agree with Dogacan - I don't see why this plugin shouldn't be turned into a 
patch for protocol-httpclient, simply adding the options that you added to your 
plugin. Other than these options these two plugins are identical.

Regarding the benefits of using http/1.1: the main difference, from the Nutch 
point of view, would be the support for keep-alives, i.e. the ability to send 
multiple requests over the same TCP connection. However, in practice this 
functionality is only rarely useful in our case, because it requires making 
many requests to the same host - whereas Nutch shuffles the hosts in order to 
provide a higher throughput and at the same time maintain the politeness 
settings. This means that with a large fetchlist containing many hosts, 
consecutive requests almost never go to the same host. This in turn means that 
in order to benefit from keep-alives we would have to keep around massive 
numbers of open connections (infeasible), or we have to drop connections 
between requests ... which is what http/1.0 does :)

> protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-557
>                 URL: https://issues.apache.org/jira/browse/NUTCH-557
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Priority: Minor
>         Attachments: protocol-http11v0.1.patch
>
>
> 'protocol-http11' is a protocol plugin which supports retrieving documents 
> via the HTTP 1.0, HTTP 1.1 and HTTPS protocols, optionally with Basic, Digest 
> and NTLM authentication schemes for web server as well as proxy server.
> The user guide and other information can be found here:- 
> [http://wiki.apache.org/nutch/protocol-http11]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to