Re: Generating Query for Multiple Clauses in a Single Field

AHMET ARSLAN Thu, 30 Jul 2009 03:43:49 -0700

> yah, before this i used default lucene...but i dont know
> what end up wrong...some results with only single word matching when to
> the top of the results.


Hmm. Interesting. It seems that length normalization causing this. Very short 
documents with only single word matching getting high score due to length 
normalization. The documents containing all of the query terms are probably 
very long and getting lower score. Lucene punishes long documents, and favors 
short documents.

Can you verify/confirm my guess looking at the document lengths of the result 
set? Also org.apache.lucene.search.Explanation describes the score computation 
for document and query.

There is an excellent publication [1] [2] (in section 4.1 and 4.2) about lucene 
score modification. SweetSpotSimilarity [3] with the appropriate parameters 
(steepness, min, and max) can solve your problem.

Alternatively if your requirement is very important (you don't care about long 
documents taking over) then you can try to extend the DefaultSimilarity so that 
it will ignore the document length. Just return 1.

public float lengthNorm(String fieldName, int numTerms) {
    return 1.0f;
  }


> This i assumed is due to the score of the result being to
> high. Tat's why i am trying to add additional boost

I don't think there exists such a boosting mechanism.

Ahmet

[1] 
http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team
[2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
[3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html




      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Generating Query for Multiple Clauses in a Single Field

Reply via email to