See below.

Also, there is new Scoring documentation available via the website (http://lucene.apache.org/java/docs/scoring.html) that covers scoring in some detail.

On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote:

Hi.

I have a question regarding Lucene scoring algorithm. Providing I have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the document in
case of the 'or' query?


Yes, it should score higher. See the coord() factor as part of the similarity.

Providing that I don't know the algorithms behind the Lucene, how does
'or' query time depends on the number of searched terms? Does it grow
linierly, exponentially? How does 'and' query time depends on the number
of searched terms? (it should decrease, right?)


Not 100% on this, but that does make sense, pretty simple to test out, I think. We are working on some benchmarks and this may be a good one to add to it.



--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to