Hello,
I am trying to implement my own Jaccard similarity for Lucene.
So far i have the following code
public class JaccardSimilarity extends DefaultSimilarity {
int numberOfDocumentTerms;
//String field="contents"; // Should the Jaccard similarity be only
based in the contents field????
@Override
public float idf(int i, int i1) {
return 1;
}
@Override
public float tf(int i) {
return 1;
}
public int getNumberOfDocumentTerms() {
return numberOfDocumentTerms;
}
public void setNumberOfDocumentTerms(int numberOfDocumentTerms) {
this.numberOfDocumentTerms = numberOfDocumentTerms;
}
@Override
public float queryNorm(float i) {
return 1.0f;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
numberOfDocumentTerms=state.getLength();//for each field we get
the number of terms
setNumberOfDocumentTerms(numberOfDocumentTerms);
System.out.println("numberOfDocumentTerms from compute : " +
numberOfDocumentTerms);
return 1.0f;
}
@Override
public float coord(int overlap, int maxOverlap) {
System.out.println("numberOfDocumentTerms : " +
getNumberOfDocumentTerms());
return (overlap/(numberOfDocumentTerms+(maxOverlap-overlap)));
}
}
The problem is that coord() method is not used (or at least so that i
understand) neither in searching nor in indexing
What do i do wrong? i need the
|overlap| - the number of query terms matched in the document
|maxOverlap| - the total number of terms in the query
to implement my scoring.
Any help would be highly appreciated
Thank you in advance!