With respect to the earlier post there seems to be a bug in lucene 1.9.1
I tried using the similarity below and changed idf to:
public float idf(int docFreq, int numDocs) {
float f = (float)(Math.log((double)numDocs/(double)(docFreq+1) + 1.0));
return f;
}
Now, when I print the explanantion for the top doc id, it includes every
term in the query twice with a raw score of 11.50651, when some terms
don't even appear in any docs. And the max raw score of the top doc is
only 4.12327.
Anyone encounter this before?
Thanks
Eugene wrote:
Hi,
I tried implementing my own Similarity and setting it in
IndexWriter.setSimilarity(new CosSimilarity()).
But, there's something weird, it doesn't seem to call the methods in my
Similarity. For example, when I set the idf to return 0.0f the
Similarity still gives me a score > 0.0f.
How do I correctly set the Similarity? I'm quite new to this, some links
to implementing Similarity will also be useful.
Thanks.
--
Eugene
Here's the code for my CosSimilarity:
import org.apache.lucene.search.Similarity;
public class CosSimilarity extends Similarity
{
public float lengthNorm(String fieldName, int numTerms) {
return 1.0f;
}
public float queryNorm(float sumOfSquaredWeights) {
return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
}
public float tf(float freq) {
return (float)(1 + Math.log(1 + freq));
}
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
public float idf(int docFreq, int numDocs) {
float f = (float)(Math.log((double)numDocs/(double)(docFreq+1) + 1.0));
System.out.println("CosSimilarity.idf>" + f);
return 0.0f;
}
public float coord(int overlap, int maxOverlap) {
return overlap / (float)maxOverlap;
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]