Hi, Following on from a discussion on user@ I dived into the GeneratorJob code and have the following general comment based on my observation... Usage of configuration options is really unstructured and loosely applied. This should not be the case. For example
Observations =========== nutch-default.xml --------------------- - generate.max.count property appears here but I cannot see for the life of me where it actually is used in the GeneratorJob, Mapper or Reducer. Unused in GeneratorJob -------------------------------- - GENERATOR_MIN_SCORE - seems not be to used - GENERATOR_MAX_COUNT - seems not be to used Missing in nutch-default.xml ------------------------------------ - generate.min.score - but used in GeneratorJob - generate.filter - set to true by default and available as a CLI override but should also be specified in nutch-default.xml - generate.normalise - set to true by default and available as a CLI override but should also be specified in nutch-default.xml - generate.topN - set to 263-1 by default and available as a CLI override but should also be specified in nutch-default.xml Suggestions to add -------------------------- - GENERATOR_COUNT_VALUE_IP - We should add a @Deprecated on this static element. I am not sure if it is used... I don't think it is. Any comments on this please? [0] http://www.mail-archive.com/user%40nutch.apache.org/msg08854.html -- *Lewis*

