Doug Cutting wrote:

I was recently benchmarking fetching at a site with lots of bandwidth, and it seemed to me that protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. But there's a lot of duplicate code between these, which is difficult to maintain.


Where do you think is the performance loss in protocol-httpclient?

I think we should thus merge these, with a configuration parameter determining which http backend is used, much like parse-html, which can switch between neko and tagsoup.

What do others think?

I think it's a good idea. Things like authentication, robots, redirects, SSL setup and HTTP result code handling logic are nearly the same.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to