Andrzej, Thanks.
A related question: Some of the sites I crawl use https: or redirect to https:. Nutch default setting does not recognize https: as valid url. Is there a way to crawl url starting with "https:"?

-AJ


Andrzej Bialecki wrote:

AJ Chen wrote:

Hi Andrzej,
Thanks for the suggestion. I'm using pdf plugin that
comes with nutch from vsn.  Where to get the PDFBox
unreleased version 0.7.2 that works for you?


http://www.pdfbox.com/dist

If you are not too familiar with the classpath setting in plugin.xml then it's better to just replace the old JAR with the new one, but keeping the same name as the old JAR.


Reply via email to