On 1/11/06, Klaus <[EMAIL PROTECTED]> wrote: > Hi all, > > do you know how the tf und idf values are computed by the default > similarity? I mean the exact mathematical equation.
Well, here is the default Similarity: /** Expert: Default scoring implementation. */ public class DefaultSimilarity extends Similarity { /** Implemented as <code>1/sqrt(numTerms)</code>. */ public float lengthNorm(String fieldName, int numTerms) { return (float)(1.0 / Math.sqrt(numTerms)); } /** Implemented as <code>1/sqrt(sumOfSquaredWeights)</code>. */ public float queryNorm(float sumOfSquaredWeights) { return (float)(1.0 / Math.sqrt(sumOfSquaredWeights)); } /** Implemented as <code>sqrt(freq)</code>. */ public float tf(float freq) { return (float)Math.sqrt(freq); } /** Implemented as <code>1 / (distance + 1)</code>. */ public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } /** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */ public float idf(int docFreq, int numDocs) { return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0); } /** Implemented as <code>overlap / maxOverlap</code>. */ public float coord(int overlap, int maxOverlap) { return overlap / (float)maxOverlap; } } If you really want to understand how scoring works, I'd suggest also looking at TermWeight/TermScorer. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]