Edward Quick wrote:
Hi,
I posted to the user list but didn't get a reply. I want to crawl a
protected site, but there doesn't seem to be an option for that in Nutch
at the moment.
However, it doesn't sound like something that would be too hard to add,
assuming the java http client library can handle that. As I'm not
familiar with the code, could someone point me at the file (or files) in
the source which do the crawling please? I'm not professing to be a top
Java programmer (perl's my speciality) but I'll give it a shot, unless
anyone else wants to?!
The quick hack would be to add necessary code somewhere in
protocol-httpclient. Eventually though, I think Nutch should grow an
authentication factory, which would supply needed credentials to other
plugins.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com