Marvin Humphrey wrote:
The only answer seems to be to apply different lengthNorm algos to
different fields.
FYI, Nutch uses the following:
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NutchSimilarity.java?view=markup
All of this is seat-of-the-pants, developed by hand-tuning a few
queries. Like code optimization, relevance tuning is better done with
large amounts of real data. If you have trusted relevant/non-relevant
judgements for a representative sample of queries, then you can do a
much better job of setting these parameters. Unfortunately, such
judgements are expensive to generate.
For Web data, one source of relevance judgements is:
http://ir.dcs.gla.ac.uk/test_collections/
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]