Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_crawl

------------------------------------------------------------------------------
  
  == Perform complete crawling and indexing given a set of root urls. ==
  
+ '''Configuration Files Used:''' 
+  hadoop-default.xml[[BR]]
+  hadoop-site.xml[[BR]]
+  crawl-tool.xml[[BR]]
+ 
  '''Usage:''' nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> 
[-dir d] [-threads n] [-depth i] [-topN]
  
- '''<urlDir>:''' contains text files with URL lists. This must be an existing 
directory.
+ '''<urlDir>:''' contains text files with URL lists. This must be an existing 
directory.  Default Value: ''None''
  
+ '''[-dir <d>]:''' The directory where Nutch will save the crawl files.  
Default Value: ''./crawl-[date]'' where [date] is the current date.
- '''[-dir d]:''' You can choose the directory, where Nutch should save the 
index.
- If you don’t choose a directory Nutch would create a own directory in the 
directory where you started the crawl.
- Example of a –dir parameter: -dir /usr/local/index/ 
  
- '''[-threads n]:''' ''<need description>''
+ '''[-threads <n>]:''' Number of Fetcher Threads to use.  Overrides the 
configuration key ''fetcher.threads.fetch''.  Default Value: ''10''
  
+ '''[-depth <i>]:''' Number of iterations Nutch should crawl. Default Value: 
''5''
- '''[-depth i]:''' You can tell Nutch how deep it should crawl. If you don’t 
tell Nutch a value, it takes 3 as his standard parameter. 
- For example if you say –depth 1, Nutch would only index the first level. 
Only if you say –depth 2 (or more) Nutch would make a link follow.
  
- '''[-topN]:''' ''<need description>''
+ '''[-topN <num>]:''' Limit crawls to the top <num> links per iteration.  
Default Value: ''Integer.MAX_VALUE''
  
  DevelopmentCommandLineOptions
  


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to