See below.
Also, there is new Scoring documentation available via the website
(http://lucene.apache.org/java/docs/scoring.html) that covers scoring
in some detail.
On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote:
Hi.
I have a question regarding Lucene scoring algorithm. Providing I
have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the
document in
case of the 'or' query?
Yes, it should score higher. See the coord() factor as part of the
similarity.
Providing that I don't know the algorithms behind the Lucene, how does
'or' query time depends on the number of searched terms? Does it grow
linierly, exponentially? How does 'and' query time depends on the
number
of searched terms? (it should decrease, right?)
Not 100% on this, but that does make sense, pretty simple to test
out, I think. We are working on some benchmarks and this may be a
good one to add to it.
--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]