Hi,
I want to use Lucene with the following scoring logic:
When I index my documents I want to set for each field a score/weight.
When I query my index I want to set for each query term a score/weight.
I will NEVER index or query with many instances of the same field - In each
query (document) there will be 0-1 instances with the same field name.
My fields/query term are not analyzed - they are already made out of one token.
I want the score to be simply the dot product between the fields of the query
to the fields of the document if they have the same value.
For example:
Query:
Field Name
Field Value
Field Score
1
AA
0.1
7
BB
0.2
8
CC
0.3
Document 1:
Field Name
Field Value
Field Score
1
AA
0.2
2
DD
0.8
7
CC
0.999
10
FFF
0.1
Document 2:
Field Name
Field Value
Field Score
7
BB
0.3
8
CC
0.5
The scores should be:
Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02
Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q *
FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)
What would be the best way implement it? In terms of accuracy and performances
(I don't need TF and IDF calculations).
I currently implemented it by setting boosts to the fields and query terms.
Then I overwritten the DefaultSimilarity class:
public class MySimilarity extends DefaultSimilarity {
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost();
}
@Override
public float queryNorm(float sumOfSquaredWeights) {
return 1;
}
@Override
public float tf(float freq) {
return 1;
}
@Override
public float idf(int docFreq, int numDocs) {
return 1;
}
@Override
public float coord(int overlap, int maxOverlap) {
return 1;
}
}
And based on
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html
this should work.
Problems:
1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing...
2. The score I get from the TopScoreDocCollector is not the same as I get from
the Explanation.
Here is part of my code:
indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, true);
indexSearcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = indexSearcher.doc(docId);
double score = hits[i].score;
String id = d.get(FIELD_ID);
Explanation explanation = indexSearcher.explain(query, docId);
}
Thanks!