> yah, before this i used default lucene...but i dont know
> what end up wrong...some results with only single word matching when to
> the top of the results.
Hmm. Interesting. It seems that length normalization causing this. Very short
documents with only single word matching getting high score due to length
normalization. The documents containing all of the query terms are probably
very long and getting lower score. Lucene punishes long documents, and favors
short documents.
Can you verify/confirm my guess looking at the document lengths of the result
set? Also org.apache.lucene.search.Explanation describes the score computation
for document and query.
There is an excellent publication [1] [2] (in section 4.1 and 4.2) about lucene
score modification. SweetSpotSimilarity [3] with the appropriate parameters
(steepness, min, and max) can solve your problem.
Alternatively if your requirement is very important (you don't care about long
documents taking over) then you can try to extend the DefaultSimilarity so that
it will ignore the document length. Just return 1.
public float lengthNorm(String fieldName, int numTerms) {
return 1.0f;
}
> This i assumed is due to the score of the result being to
> high. Tat's why i am trying to add additional boost
I don't think there exists such a boosting mechanism.
Ahmet
[1]
http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team
[2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
[3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]