Hello lucene people!
First of all, i would like to thank all of community participants (
developers, users, Erik and Otis for "Lucene in Action" book ) for
their great work.

As far as i understand it, there are two most popular approches
concerning document similarity:
1. "cosine metrics" using term vectors
2. constructing MoreLikeThis query by document

In my case, i need to filter similar documents in search results and
therefore determine document similarity during indexing process using
term vectors. Obviously, i can't compare currently indexing document
with all documents in my collection. Should i restrict documents in my
collection using constructing some kind of "LikeThis" query?
What's a best/common practices for such things?

Thanks in advance,
Alex Serba

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to