Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie: http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate The comment on the change is: Examples, Config Values ------------------------------------------------------------------------------ nutch-default.xml[[BR]] nutch-site.xml[[BR]] + === Configuration Values === + The following properties directory affect how the Generator generates fetch segments.[[BR]][[BR]] + generate.max.per.host -- Sets the maximum number of URLs from a single host to be generated for this fetch run. Default: unlimited.[[BR]] + === Other Files === None. @@ -26, +30 @@ Differences from 0.7.1 One major change from 0.7.1 was that -numFetchers was used to influence the number of fetcher segments created. For instance if -numFetchers 2 was specified there would be 2 fetcher segments created under <segments_dir>. Under 0.8 this is no longer the case. + === Examples === + {{{ + nutch-0.8-dev/bin/nutch generate /my/crawldb /my/segments + }}} + This example will generate a fetch list that contains all URLs ready to be fetched from the Crawl Database. The Crawl Database is located at my/crawldb and the Generator will output the fetch list to /my/segments/yyyyMMddHHmmss. + + {{{ + nutch-0.8-dev/bin/nutch generate /my/crawldb /my/segments -topN 100 -adddays 20 + }}} + In this example the Generator will add 20 days to the current date/time when determining the top 100 scoring pages to fetch. + DevelopmentCommandLineOptions
