Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_crawl

New page:
= nutch-0.8-dev/bin/nutch crawl =

== "crawl" is an alias for "org.apache.nutch.crawl.Crawl" ==

=== Perform complete crawling and indexing given a set of root urls. ===

'''Usage:''' nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> 
[-dir d] [-threads n] [-depth i] [-topN]

'''<urlDir>:''' contains text files with URL lists. This must be an existing 
directory.

'''[-dir d]:''' You can choose the directory, where Nutch should save the index.
If you don’t choose a directory Nutch would create a own directory in the 
directory where you started the crawl.
Example of a –dir parameter: -dir /usr/local/index/ 

'''[-threads n]:''' ''<need description>''

'''[-depth i]:''' You can tell Nutch how deep it should crawl. If you don’t 
tell Nutch a value, it takes 3 as his standard parameter. 
For example if you say –depth 1, Nutch would only index the first level. Only 
if you say –depth 2 (or more) Nutch would make a link follow.

'''[-topN]:''' ''<need description>''

DevelopmentCommandLineOptions

Reply via email to