The project semanticvectors might be doing what you are looking for. paul
On 11 nov. 2014, at 22:37, parnab kumar <parnab.2...@gmail.com> wrote: > hi, > > While indexing the documents , store the Term Vectors for the content > field. Now for each document you will have an array of terms and their > corresponding frequency in the document. Using the Index Reader you can > retrieve this term vectors. Similarity between two documents can be > computed as the similarity of their term vectors. Since tf-idf is most well > known and seems to give better sense of similarity, simply multiply the idf > of the term with the frequency to re weight the vectors. > > Thanks, > Parnab > > On Tue, Nov 11, 2014 at 8:36 PM, Elshaimaa Ali <elshaimaa....@hotmail.com> > wrote: > >> Hi All, >> I have a Lucene index built with Lucene 4.9 for 584 text documents, I need >> to extract a Document-term matrix, and Document Document similarity matrix >> in-order to use it to cluster the documents. My questions:1- How can I >> extract the matrix and compute the similarity between documents in >> Lucene.2- Is there any java based code that can cluster the documents from >> Lucene index. >> RegardsShaimaa >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org