I was recently benchmarking fetching at a site with lots of bandwidth,
and it seemed to me that protocol-http is capable of faster crawling
than protocol-httpclient. So I don't think we should discard
protocol-http just yet. But there's a lot of duplicate code between
these, which is
+1
I've been planning to switch my crawler over to use protocol-
httpclient, but haven't got there yet. Interesting that there seems
to be a performance impact with the new plugin.
(In my crawl setup, I override the default HTTP plugin so I can
modify HTML content before it is written to
Doug Cutting wrote:
... protocol-http is capable of faster crawling than protocol-httpclient.
So I don't think we should discard protocol-http just yet.
What do others think?
I think:
HttpClient-based [protocol-httpclient] uses own Threads.
[protocol-http] does not create Threads.
We should
I was recently benchmarking fetching at a site with lots of
bandwidth, and it seemed to me that protocol-http is capable of
faster crawling than protocol-httpclient. So I don't think we should
discard protocol-http just yet. But there's a lot of duplicate code
between these, which is