Marvin Humphrey wrote:
The only answer seems to be to apply different lengthNorm algos to different fields.

FYI, Nutch uses the following:

http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NutchSimilarity.java?view=markup

All of this is seat-of-the-pants, developed by hand-tuning a few queries. Like code optimization, relevance tuning is better done with large amounts of real data. If you have trusted relevant/non-relevant judgements for a representative sample of queries, then you can do a much better job of setting these parameters. Unfortunately, such judgements are expensive to generate.

For Web data, one source of relevance judgements is:

http://ir.dcs.gla.ac.uk/test_collections/

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to