[Nutch Wiki] Update of "NutchTutorial" by SebastianNagel

Apache Wiki Wed, 15 Aug 2018 06:42:24 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "NutchTutorial" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=91&rev2=92

Comment:
After release of 1.15: remove -Dsolr.server.url=... which has no effect now; 
fix passing <Seed Dir>

  == Using the crawl script ==
  If you have followed the section above on how the crawling can be done step 
by step, you might be wondering how a bash script can be written to automate 
all the process described above.
  
- Nutch developers have written one for you :), and it is available at 
[[bin/crawl]].
+ Nutch developers have written one for you :), and it is available at 
[[bin/crawl]]. Here the most common options and parameters:
  
  {{{
-      Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num 
Rounds>
+      Usage: crawl [-i|--index] [-D "key=value"] [-s <Seed Dir>] <Crawl Dir> 
<Num Rounds>
        -i|--index      Indexes crawl results into a configured indexer
-       -D              A Java property to pass to Nutch calls
+       -D...           A Java property to pass to Nutch calls
-       Seed Dir        Directory in which to look for a seeds file
+       -s <Seed Dir>   Directory in which to look for a seeds file
-       Crawl Dir       Directory where the crawl/link/segments dirs are saved
+       <Crawl Dir>     Directory where the crawl/link/segments dirs are saved
-       Num Rounds      The number of rounds to run this crawl for
+       <Num Rounds>    The number of rounds to run this crawl for
-      Example: bin/crawl -i -D 
solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2
+      Example: bin/crawl -i -s urls/ TestCrawl/  2
  }}}
  The crawl script has lot of parameters set, and you can modify the parameters 
to your needs. It would be ideal to understand the parameters before setting up 
big crawls.

[Nutch Wiki] Update of "NutchTutorial" by SebastianNagel

Reply via email to