Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_crawl

------------------------------------------------------------------------------
  
  == Perform complete crawling and indexing given a set of root urls. ==
  
- '''Configuration Files Used:''' 
+ === Usage ===
+  nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> [-dir d] 
[-threads n] [-depth i] [-topN]
+ 
+   '''<urlDir>:''' contains text files with URL lists. This must be an 
existing directory.  Default Value: ''None''[[BR]]
+   '''[-dir <d>]:''' The directory where Nutch will save the crawl files.  
Default Value: ''./crawl-[date]'' where [date] is the current date.[[BR]]
+   '''[-threads <n>]:''' Number of Fetcher Threads to use.  Overrides the 
configuration key ''fetcher.threads.fetch''.  Default Value: ''10''[[BR]]
+   '''[-depth <i>]:''' Number of iterations Nutch should crawl. Default Value: 
''5''[[BR]]
+   '''[-topN <num>]:''' Limit crawls to the top <num> links per iteration.  
Default Value: ''Integer.MAX_VALUE''[[BR]]
+ 
+ === Configuration Files ===
   hadoop-default.xml[[BR]]
   hadoop-site.xml[[BR]]
   crawl-tool.xml[[BR]]
  
- '''Usage:''' nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Crawl <urlDir> 
[-dir d] [-threads n] [-depth i] [-topN]
+ === Other Files ===
+  crawl-urlfilter.txt
  
+ === Caveats and Notes ===
+  None.
- '''<urlDir>:''' contains text files with URL lists. This must be an existing 
directory.  Default Value: ''None''
- 
- '''[-dir <d>]:''' The directory where Nutch will save the crawl files.  
Default Value: ''./crawl-[date]'' where [date] is the current date.
- 
- '''[-threads <n>]:''' Number of Fetcher Threads to use.  Overrides the 
configuration key ''fetcher.threads.fetch''.  Default Value: ''10''
- 
- '''[-depth <i>]:''' Number of iterations Nutch should crawl. Default Value: 
''5''
- 
- '''[-topN <num>]:''' Limit crawls to the top <num> links per iteration.  
Default Value: ''Integer.MAX_VALUE''
  
  DevelopmentCommandLineOptions
  

Reply via email to