See MoreLikeThis in the contrib/queries folder. It optimizes the speed
of similarity comparisons by taking the most significant words only
from a document as search terms.
On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote:
Hi,
It's my first experiment with Lucene. Please help me.
I'm
going to index a set of documents and create a feature vector for each
of them. This vector contains all terms belong to the document that
weight using TFIDF.
After that I want to compute the cosine similarity between all
documents and produce a doc-doc similarity matrix. My document set
is large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org