Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchTutorial" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=91&rev2=92

Comment:
After release of 1.15: remove -Dsolr.server.url=... which has no effect now; 
fix passing <Seed Dir>

  == Using the crawl script ==
  If you have followed the section above on how the crawling can be done step 
by step, you might be wondering how a bash script can be written to automate 
all the process described above.
  
- Nutch developers have written one for you :), and it is available at 
[[bin/crawl]].
+ Nutch developers have written one for you :), and it is available at 
[[bin/crawl]]. Here the most common options and parameters:
  
  {{{
-      Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num 
Rounds>
+      Usage: crawl [-i|--index] [-D "key=value"] [-s <Seed Dir>] <Crawl Dir> 
<Num Rounds>
        -i|--index      Indexes crawl results into a configured indexer
-       -D              A Java property to pass to Nutch calls
+       -D...           A Java property to pass to Nutch calls
-       Seed Dir        Directory in which to look for a seeds file
+       -s <Seed Dir>   Directory in which to look for a seeds file
-       Crawl Dir       Directory where the crawl/link/segments dirs are saved
+       <Crawl Dir>     Directory where the crawl/link/segments dirs are saved
-       Num Rounds      The number of rounds to run this crawl for
+       <Num Rounds>    The number of rounds to run this crawl for
-      Example: bin/crawl -i -D 
solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2
+      Example: bin/crawl -i -s urls/ TestCrawl/  2
  }}}
  The crawl script has lot of parameters set, and you can modify the parameters 
to your needs. It would be ideal to understand the parameters before setting up 
big crawls.
  

Reply via email to