Re: Doc-Doc Similarity Matrix Construction

Mark Harwood Mon, 29 Jun 2009 12:23:21 -0700

See MoreLikeThis in the contrib/queries folder. It optimizes the speedof similarity comparisons by taking the most significant words onlyfrom a document as search terms.




On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote:

Hi,
It's my first experiment with Lucene. Please help me.
I'm
going to index a set of documents and create a feature vector for each
of them. This vector contains all terms belong to the document that
weight using TFIDF.
After that I want to compute the cosine similarity between alldocuments and produce a doc-doc similarity matrix. My document setis large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Doc-Doc Similarity Matrix Construction

Reply via email to