Hi Lewis i think generate.max.count is used by someone who want to limits the number urls per domain (host). see http://wiki.apache.org/nutch/Nutch2Crawling#Reducer
The generate.min.score property is already defined in nutch-default.xml. The generate.(filter|normalise|topN) can be passed through Generator command line. So i think it's a little more flexible than to defined in nutch-default.xml. i see generate.count.mode property description in nutch-default.xml <property> <name>generate.count.mode</name> <value>host</value> <description>Determines how the URLs are counted for generator.max.count. Default value is 'host' but can be 'domain'. Note that we do not count per IP in the new version of the Generator. </description> </property> May be the GENERATOR_COUNT_VALUE_IP mode will be add in next new Generator version. On Thu, Feb 21, 2013 at 5:05 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > Following on from a discussion on user@ I dived into the GeneratorJob > code and have the following general comment based on my observation... > Usage of configuration options is really unstructured and loosely applied. > This should not be the case. For example > > Observations > =========== > > nutch-default.xml > --------------------- > - generate.max.count property appears here but I cannot see for the life > of me where it actually is used in the GeneratorJob, Mapper or Reducer. > > Unused in GeneratorJob > -------------------------------- > - GENERATOR_MIN_SCORE - seems not be to used > - GENERATOR_MAX_COUNT - seems not be to used > > Missing in nutch-default.xml > ------------------------------------ > - generate.min.score - but used in GeneratorJob > - generate.filter - set to true by default and available as a CLI > override but should also be specified in nutch-default.xml > - generate.normalise - set to true by default and available as a CLI > override but should also be specified in nutch-default.xml > - generate.topN - set to 263-1 by default and available as a CLI > override but should also be specified in nutch-default.xml > > Suggestions to add > -------------------------- > - GENERATOR_COUNT_VALUE_IP - We should add a @Deprecated on this static > element. I am not sure if it is used... I don't think it is. > > Any comments on this please? > > [0] http://www.mail-archive.com/user%40nutch.apache.org/msg08854.html > > -- > *Lewis* > -- Don't Grow Old, Grow Up... :-)

