Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie: http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate New page: = "generate" is an alias for "org.apache.nutch.crawl.Generator" = == Generates a new Fetcher Segment from the Crawl Database == === Usage === nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Generator <crawldb> <segments_dir> [-topN <num>] [-numFetchers <fetchers>] [-adddays <days>] '''<crawldb>:''' Path to the crawldb directory.[[BR]] '''<segments_dir>:''' Path to the directory where the Fetcher Segments are created.[[BR]] '''[-topN <num>]:''' Selects the top ''<num>'' ranking URLs for this segment. Default: ''Long.MAX_VALUE''[[BR]] '''[-numFetchers <fetchers>]:''' The number of fetch partitions. Default: ''Configuration key -> mapred.map.tasks -> 1''[[BR]] '''[-adddays <days>]:''' Adds <days> to the current time to facilitate crawling urls already fetched sooner then ''db.default.fetch.interval''. Default: ''0''[[BR]] === Configuration Files === hadoop-default.xml[[BR]] hadoop-site.xml[[BR]] nutch-default.xml[[BR]] nutch-site.xml[[BR]] === Other Files === None. === Caveats and Notes === None. DevelopmentCommandLineOptions
