Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term?  Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms two vectors have in common.

>>> David Spencer <[EMAIL PROTECTED]> 06/01/04 08:25PM >>>
Erik Hatcher wrote:

> On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:
>
>> Well, a question again, how does Lucene compute the score between a 

>> document and a query?
>

And I might add, thus, this approach to similarity gives more weight to

rare terms that match, which one might want for this kind of similarity

measure.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to