I have seen different versions of Lucene's ranking function
from the similarity document and Lucene user list.

Since I need to get document-doucment similaries,
so what I do is to issue the document as query directly.
I found it is different if we issue "computer computer"
to Lucene vers we issue it to standard VSM. The latter one
will treat "computer computer" as "computer" but Lucene
doesn't.

In order to illustrate my question more clear, I write
a more formalized document

http://www.cs.virginia.edu/~xj3a/lucene_ranking.pdf

so that there is no ambiguity of those formulas.

I am not asure whether I understand correctly, but the
major reason comes from Lucene's query parser. It defaults
each term appear once. If we issue a query term multiple
times in the query string, it will result in some un-expected
results.

For detail information, pls refer to the attached link.

thanks

xiangyu jin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to