+1

I've been planning to switch my crawler over to use protocol- httpclient, but haven't got there yet. Interesting that there seems to be a performance impact with the new plugin.

(In my crawl setup, I override the default HTTP plugin so I can modify HTML content before it is written to a segment. I'd prefer if there was a hook for rewriting content regardless of protocol, but this works for now.)

--Matt

On Nov 9, 2005, at 1:19 PM, Doug Cutting wrote:

I was recently benchmarking fetching at a site with lots of bandwidth, and it seemed to me that protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. But there's a lot of duplicate code between these, which is difficult to maintain.

I think we should thus merge these, with a configuration parameter determining which http backend is used, much like parse-html, which can switch between neko and tagsoup.

What do others think?

Doug

--
Matt Kangas / [EMAIL PROTECTED]


Reply via email to