protocol-http versus protocol-httpclient

2005-11-09 Thread Doug Cutting
I was recently benchmarking fetching at a site with lots of bandwidth, and it seemed to me that protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. But there's a lot of duplicate code between these, which is

Re: protocol-http versus protocol-httpclient

2005-11-09 Thread Matt Kangas
+1 I've been planning to switch my crawler over to use protocol- httpclient, but haven't got there yet. Interesting that there seems to be a performance impact with the new plugin. (In my crawl setup, I override the default HTTP plugin so I can modify HTML content before it is written to

RE: protocol-http versus protocol-httpclient

2005-11-09 Thread Fuad Efendi
Doug Cutting wrote: ... protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. What do others think? I think: HttpClient-based [protocol-httpclient] uses own Threads. [protocol-http] does not create Threads. We should

Re: protocol-http versus protocol-httpclient

2005-11-09 Thread Ken Krugler
I was recently benchmarking fetching at a site with lots of bandwidth, and it seemed to me that protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. But there's a lot of duplicate code between these, which is