Hmm, the term vector does not have to consist of only term frequencies, does it? To give weight to rare terms, could you create a term vector of (TF*IDF) values for each term? Then, a distance function would measure how many terms two vectors have in common, giving weight to how many rare terms two vectors have in common.
>>> David Spencer <[EMAIL PROTECTED]> 06/01/04 08:25PM >>> Erik Hatcher wrote: > On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote: > >> Well, a question again, how does Lucene compute the score between a >> document and a query? > And I might add, thus, this approach to similarity gives more weight to rare terms that match, which one might want for this kind of similarity measure. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]