> yah, before this i used default lucene...but i dont know > what end up wrong...some results with only single word matching when to > the top of the results.
Hmm. Interesting. It seems that length normalization causing this. Very short documents with only single word matching getting high score due to length normalization. The documents containing all of the query terms are probably very long and getting lower score. Lucene punishes long documents, and favors short documents. Can you verify/confirm my guess looking at the document lengths of the result set? Also org.apache.lucene.search.Explanation describes the score computation for document and query. There is an excellent publication [1] [2] (in section 4.1 and 4.2) about lucene score modification. SweetSpotSimilarity [3] with the appropriate parameters (steepness, min, and max) can solve your problem. Alternatively if your requirement is very important (you don't care about long documents taking over) then you can try to extend the DefaultSimilarity so that it will ignore the document length. Just return 1. public float lengthNorm(String fieldName, int numTerms) { return 1.0f; } > This i assumed is due to the score of the result being to > high. Tat's why i am trying to add additional boost I don't think there exists such a boosting mechanism. Ahmet [1] http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team [2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf [3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org