Hi,
Following on from a discussion on user@ I dived into the GeneratorJob code
and have the following general comment based on my observation... Usage of
configuration options is really unstructured and loosely applied. This
should not be the case. For example

Observations
===========

nutch-default.xml
---------------------
 - generate.max.count property appears here but I cannot see for the life
of me where it actually is used in the GeneratorJob, Mapper or Reducer.

Unused in GeneratorJob
--------------------------------
 - GENERATOR_MIN_SCORE - seems not be to used
 - GENERATOR_MAX_COUNT - seems not be to used

Missing in nutch-default.xml
------------------------------------
 - generate.min.score - but used in GeneratorJob
 - generate.filter - set to true by default and available as a CLI override
but should also be specified in nutch-default.xml
 - generate.normalise - set to true by default and available as a CLI
override but should also be specified in nutch-default.xml
 - generate.topN - set to 263-1 by default and available as a CLI override
but should also be specified in nutch-default.xml

Suggestions to add
--------------------------
 - GENERATOR_COUNT_VALUE_IP - We should add a @Deprecated on this static
element. I am not sure if it is used... I don't think it is.

Any comments on this please?

[0] http://www.mail-archive.com/user%40nutch.apache.org/msg08854.html

-- 
*Lewis*

Reply via email to