Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate

The comment on the change is:
Examples, Config Values

------------------------------------------------------------------------------
   nutch-default.xml[[BR]]
   nutch-site.xml[[BR]]
  
+ === Configuration Values ===
+  The following properties directory affect how the Generator generates fetch 
segments.[[BR]][[BR]]
+   generate.max.per.host -- Sets the maximum number of URLs from a single host 
to be generated for this fetch run.  Default: unlimited.[[BR]]
+   
  === Other Files ===
   None.
  
@@ -26, +30 @@

   Differences from 0.7.1
    One major change from 0.7.1 was that -numFetchers was used to influence the 
number of fetcher segments created.  For instance if -numFetchers 2 was 
specified there would be 2 fetcher segments created under <segments_dir>.  
Under 0.8 this is no longer the case.
  
+ === Examples ===
+ {{{
+  nutch-0.8-dev/bin/nutch generate /my/crawldb /my/segments
+ }}}
+  This example will generate a fetch list that contains all URLs ready to be 
fetched from the Crawl Database. The Crawl Database is located at my/crawldb 
and the Generator will output the fetch list to /my/segments/yyyyMMddHHmmss.
+ 
+ {{{
+  nutch-0.8-dev/bin/nutch generate /my/crawldb /my/segments -topN 100 -adddays 
20
+ }}}
+  In this example the Generator will add 20 days to the current date/time when 
determining the top 100 scoring pages to fetch.  
+ 
  DevelopmentCommandLineOptions
  

Reply via email to