Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate

New page:
= "generate" is an alias for "org.apache.nutch.crawl.Generator" =

== Generates a new Fetcher Segment from the Crawl Database ==

=== Usage ===
 nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Generator <crawldb> 
<segments_dir> [-topN <num>] [-numFetchers <fetchers>] [-adddays <days>]

  '''<crawldb>:''' Path to the crawldb directory.[[BR]]
  '''<segments_dir>:''' Path to the directory where the Fetcher Segments are 
created.[[BR]]
  '''[-topN <num>]:''' Selects the top ''<num>'' ranking URLs for this segment. 
Default: ''Long.MAX_VALUE''[[BR]]
  '''[-numFetchers <fetchers>]:''' The number of fetch partitions. Default: 
''Configuration key -> mapred.map.tasks -> 1''[[BR]]
  '''[-adddays <days>]:''' Adds <days> to the current time to facilitate 
crawling urls already fetched sooner then ''db.default.fetch.interval''. 
Default: ''0''[[BR]]

=== Configuration Files ===
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]

=== Other Files ===
 None.

=== Caveats and Notes ===
 None.

DevelopmentCommandLineOptions

Reply via email to