Andrzej Bialecki wrote:
Hmm... I'm not saying it's flawless, there were surely some mysterious things going on with it. That large crawl you mention, was it with the (recently updated in Nutch) release 3.0? What were the issues?
No, it was in early December, with the previous version. I don't recall the details, but it seemed slower, had a higher error rate, and seemed to result in more hung thread incidents.
The main advantage of protocol-http is that it's so simple that few things can go wrong, but this also means it's relatively unsophisticated, and adding more advanced features could mean a lot of work. Namely, adding support for https, cookies and authentication.
These are all good reasons to use protocol-httpclient. But if you don't need any of those features, protocol-http seems to presently work better.
Perhaps we should get more feedback on the 3.0 version before we make a decision?
Doug
