Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch fetch" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/bin/nutch%20fetch?action=diff&rev1=1&rev2=2 If there are still unfetched items in the queues, but none of the items are ready, FetcherThread-s will spin-wait until either some items become available, or a timeout is reached (at which point the Fetcher will abort, assuming the task is hung). + == Nutch 1.x == + {{{ Usage: bin/nutch fetch <segment> [-threads n] [-noParsing] }}} @@ -22, +24 @@ '''[-noParsing]''': If no arguement is passed this value is the default, as set in nutch-default.xml. This is the case due to errors which can occur when parsing segments. If parsing errors occur then the results of the whole fetching process can be corrupted. Note that parsing will only follow meta-redirects coming from the original URL. + == Nutch 2.x == + + {{{ + Usage: FetcherJob (<batchId> | -all) [-crawlId <id>] [-threads N] [-resume] [-numTasks N] + <batchId> - crawl identifier returned by Generator, or -all for all + generated batchId-s + -crawlId <id> - the id to prefix the schemas to operate on, + (default: storage.crawl.id) + -threads N - number of fetching threads per task + -resume - resume interrupted job + -numTasks N - if N > 0 then use this many reduce tasks for fetching + (default: mapred.map.tasks) + + }}} + CommandLineOptions

