[Nutch Wiki] Update of "bin/nutch fetch" by kiranchitturi

Apache Wiki Wed, 20 Mar 2013 12:43:22 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "bin/nutch fetch" page has been changed by kiranchitturi:
http://wiki.apache.org/nutch/bin/nutch%20fetch?action=diff&rev1=1&rev2=2

  
  If there are still unfetched items in the queues, but none of the items are 
ready, FetcherThread-s will spin-wait until either some items become available, 
or a timeout is reached (at which point the Fetcher will abort, assuming the 
task is hung).
  
+ == Nutch 1.x ==
+ 
  {{{
  Usage: bin/nutch fetch <segment> [-threads n] [-noParsing]
  }}}
@@ -22, +24 @@

  
  '''[-noParsing]''': If no arguement is passed this value is the default, as 
set in nutch-default.xml. This is the case due to errors which can occur when 
parsing segments. If parsing errors occur then the results of the whole 
fetching process can be corrupted. Note that parsing will only follow 
meta-redirects coming from the original URL.
  
+ == Nutch 2.x ==
+ 
+ {{{
+ Usage: FetcherJob (<batchId> | -all) [-crawlId <id>] [-threads N] [-resume] 
[-numTasks N]
+        <batchId>     - crawl identifier returned by Generator, or -all for 
all 
+                   generated batchId-s
+        -crawlId <id> - the id to prefix the schemas to operate on, 
+                   (default: storage.crawl.id)
+        -threads N    - number of fetching threads per task
+        -resume       - resume interrupted job
+        -numTasks N   - if N > 0 then use this many reduce tasks for fetching 
+                   (default: mapred.map.tasks)
+ 
+ }}}
+ 
  CommandLineOptions

[Nutch Wiki] Update of "bin/nutch fetch" by kiranchitturi

Reply via email to