On 1/11/06, Klaus <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> do you know how the tf und idf values are computed by the default
> similarity? I mean the exact mathematical equation.
Well, here is the default Similarity:
/** Expert: Default scoring implementation. */
public class DefaultSimilarity extends Similarity {
/** Implemented as <code>1/sqrt(numTerms)</code>. */
public float lengthNorm(String fieldName, int numTerms) {
return (float)(1.0 / Math.sqrt(numTerms));
}
/** Implemented as <code>1/sqrt(sumOfSquaredWeights)</code>. */
public float queryNorm(float sumOfSquaredWeights) {
return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
}
/** Implemented as <code>sqrt(freq)</code>. */
public float tf(float freq) {
return (float)Math.sqrt(freq);
}
/** Implemented as <code>1 / (distance + 1)</code>. */
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
/** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */
public float idf(int docFreq, int numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}
/** Implemented as <code>overlap / maxOverlap</code>. */
public float coord(int overlap, int maxOverlap) {
return overlap / (float)maxOverlap;
}
}
If you really want to understand how scoring works, I'd suggest also
looking at TermWeight/TermScorer.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]