Doug Cutting wrote:
I was recently benchmarking fetching at a site with lots of bandwidth,
and it seemed to me that protocol-http is capable of faster crawling
than protocol-httpclient. So I don't think we should discard
protocol-http just yet. But there's a lot of duplicate code between
these, which is difficult to maintain.
Where do you think is the performance loss in protocol-httpclient?
I think we should thus merge these, with a configuration parameter
determining which http backend is used, much like parse-html, which
can switch between neko and tagsoup.
What do others think?
I think it's a good idea. Things like authentication, robots, redirects,
SSL setup and HTTP result code handling logic are nearly the same.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com