Re: Document Term matrix

Paul Libbrecht Tue, 11 Nov 2014 13:42:06 -0800

The project semanticvectors might be doing what you are looking for.
paul


On 11 nov. 2014, at 22:37, parnab kumar <[email protected]> wrote:

> hi,
> 
> While indexing the documents , store the Term Vectors for the content
> field. Now for each document you will have an array of terms  and their
> corresponding frequency in the document. Using the Index Reader you can
> retrieve this term vectors. Similarity between two documents can be
> computed as the similarity of their term vectors. Since tf-idf is most well
> known and seems to give better sense of similarity, simply multiply the idf
> of the term with the frequency to re weight the vectors.
> 
> Thanks,
> Parnab
> 
> On Tue, Nov 11, 2014 at 8:36 PM, Elshaimaa Ali <[email protected]>
> wrote:
> 
>> Hi All,
>> I have a Lucene index built with Lucene 4.9 for 584 text documents, I need
>> to extract a Document-term matrix, and Document Document similarity matrix
>> in-order to use it to cluster the documents. My questions:1- How can I
>> extract the matrix and compute the similarity between documents in
>> Lucene.2- Is there any java based code that can cluster the documents from
>> Lucene index.
>> RegardsShaimaa
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Document Term matrix

Reply via email to