Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch generate" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/bin/nutch%20generate?action=diff&rev1=4&rev2=5

Comment:
Add information about scope (per segment / over all segments) of -topN and 
generate.max.count when multiple segments are generated

  
  '''<segments_dir>''': Path to the location of our segments directory where 
the Fetcher Segments are created.
  
- '''[-force]''': This arguement will force an update even if there appears to 
be a lock. /!\ : CAUTION: advised /!\
+ '''[-force]''': This argument will force an update even if there appears to 
be a lock. /!\ : CAUTION: advised /!\
  
  '''[-topN N]''': Where N is the number of top URLs to be selected. Normally, 
the "generate" command prepares a fetchlist out of all unfetched pages, or the 
ones where fetch interval already expired. But if you use -topN, then instead 
of all unfetched urls you only get N urls with the highest score - potentially 
the most interesting ones, which should be prioritized in fetching.
  
@@ -27, +27 @@

  
  '''[-noNorm]''': The exact same applies for normalisation parameter as does 
for the filtering option above.
  
- '''[-maxNumSegments num]''': The (maximum) number of segments to be 
generated. Default: 1
+ '''[-maxNumSegments num]''': The (maximum) number of segments to be 
generated. Default: 1 -- Note: if multiple segments are generated, the limit 
-topN applies to the total number of URLs for all segments taken together, 
while generate.max.count is applied to every generated segment individually. 
  
  ==== Configuration Files ====
   hadoop-default.xml<<BR>>

Reply via email to