Edward Quick wrote:
Hi,

I posted to the user list but didn't get a reply. I want to crawl a protected site, but there doesn't seem to be an option for that in Nutch at the moment.

However, it doesn't sound like something that would be too hard to add, assuming the java http client library can handle that. As I'm not familiar with the code, could someone point me at the file (or files) in the source which do the crawling please? I'm not professing to be a top Java programmer (perl's my speciality) but I'll give it a shot, unless anyone else wants to?!

The quick hack would be to add necessary code somewhere in protocol-httpclient. Eventually though, I think Nutch should grow an authentication factory, which would supply needed credentials to other plugins.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to