Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_crawl

------------------------------------------------------------------------------
  
  == Perform complete crawling and indexing given a set of root urls. ==
  
- '''Configuration Files Used:''' 
+ === Usage ===
+  nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> [-dir d] 
[-threads n] [-depth i] [-topN]
+ 
+   '''<urlDir>:''' contains text files with URL lists. This must be an 
existing directory.  Default Value: ''None''[[BR]]
+   '''[-dir <d>]:''' The directory where Nutch will save the crawl files.  
Default Value: ''./crawl-[date]'' where [date] is the current date.[[BR]]
+   '''[-threads <n>]:''' Number of Fetcher Threads to use.  Overrides the 
configuration key ''fetcher.threads.fetch''.  Default Value: ''10''[[BR]]
+   '''[-depth <i>]:''' Number of iterations Nutch should crawl. Default Value: 
''5''[[BR]]
+   '''[-topN <num>]:''' Limit crawls to the top <num> links per iteration.  
Default Value: ''Integer.MAX_VALUE''[[BR]]
+ 
+ === Configuration Files ===
   hadoop-default.xml[[BR]]
   hadoop-site.xml[[BR]]
   crawl-tool.xml[[BR]]
  
- '''Usage:''' nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> 
[-dir d] [-threads n] [-depth i] [-topN]
+ === Other Files ===
+  crawl-urlfilter.txt
  
+ === Caveats and Notes ===
+  None.
- '''<urlDir>:''' contains text files with URL lists. This must be an existing 
directory.  Default Value: ''None''
- 
- '''[-dir <d>]:''' The directory where Nutch will save the crawl files.  
Default Value: ''./crawl-[date]'' where [date] is the current date.
- 
- '''[-threads <n>]:''' Number of Fetcher Threads to use.  Overrides the 
configuration key ''fetcher.threads.fetch''.  Default Value: ''10''
- 
- '''[-depth <i>]:''' Number of iterations Nutch should crawl. Default Value: 
''5''
- 
- '''[-topN <num>]:''' Limit crawls to the top <num> links per iteration.  
Default Value: ''Integer.MAX_VALUE''
  
  DevelopmentCommandLineOptions
  


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to