Dear all, I'd like to do document clustering using full-text with Lucene. In other words, I would like to group similar documents in their respective groups. I searched the mailing list and found that there are two ways around. The first method is to represent the one document as query and search the collection. The other way would be to construct the vector of terms of each of the documents and use the cosine distance function to compute the similarity. I found these methods here:
- http://www.mail-archive.com/[EMAIL PROTECTED]/msg04916.html). I would like to know whether there are better way? or any built-in functions to do clustering in the recent release version of Lucene? Thank you. Kind regards, Supheakmungkol