+1
I've been planning to switch my crawler over to use protocol-
httpclient, but haven't got there yet. Interesting that there seems
to be a performance impact with the new plugin.
(In my crawl setup, I override the default HTTP plugin so I can
modify HTML content before it is written to a segment. I'd prefer if
there was a hook for rewriting content regardless of protocol, but
this works for now.)
--Matt
On Nov 9, 2005, at 1:19 PM, Doug Cutting wrote:
I was recently benchmarking fetching at a site with lots of
bandwidth, and it seemed to me that protocol-http is capable of
faster crawling than protocol-httpclient. So I don't think we
should discard protocol-http just yet. But there's a lot of
duplicate code between these, which is difficult to maintain.
I think we should thus merge these, with a configuration parameter
determining which http backend is used, much like parse-html, which
can switch between neko and tagsoup.
What do others think?
Doug
--
Matt Kangas / [EMAIL PROTECTED]