Doug Cutting wrote: >... protocol-http is capable of faster crawling than protocol-httpclient. > So I don't think we should discard protocol-http just yet.
>What do others think? I think: HttpClient-based [protocol-httpclient] uses own Threads. [protocol-http] does not create Threads. We should manage this, [protocol-httpclient] is just temporary solution for Cookies, Proxy, HTTPS etc.; [protocol-httpclient] still caches DNS-to-IP mappings forever; Thread-related issues are very important... Additionally, we should have such a setting: "Wait 5 second between requests to SLOW servers" - it means, that Nutch can dynamically define fast/slow servers and work faster/slower... Fuad
