Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchTutorial" page has been changed by SebastianNagel: https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=91&rev2=92 Comment: After release of 1.15: remove -Dsolr.server.url=... which has no effect now; fix passing <Seed Dir> == Using the crawl script == If you have followed the section above on how the crawling can be done step by step, you might be wondering how a bash script can be written to automate all the process described above. - Nutch developers have written one for you :), and it is available at [[bin/crawl]]. + Nutch developers have written one for you :), and it is available at [[bin/crawl]]. Here the most common options and parameters: {{{ - Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds> + Usage: crawl [-i|--index] [-D "key=value"] [-s <Seed Dir>] <Crawl Dir> <Num Rounds> -i|--index Indexes crawl results into a configured indexer - -D A Java property to pass to Nutch calls + -D... A Java property to pass to Nutch calls - Seed Dir Directory in which to look for a seeds file + -s <Seed Dir> Directory in which to look for a seeds file - Crawl Dir Directory where the crawl/link/segments dirs are saved + <Crawl Dir> Directory where the crawl/link/segments dirs are saved - Num Rounds The number of rounds to run this crawl for + <Num Rounds> The number of rounds to run this crawl for - Example: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/ 2 + Example: bin/crawl -i -s urls/ TestCrawl/ 2 }}} The crawl script has lot of parameters set, and you can modify the parameters to your needs. It would be ideal to understand the parameters before setting up big crawls.

