> > > In this case, the document needs to be paired with the nearest cluster > right, something like Canopy clustering should give partial connection > graph >
Just populate similarity values for documents in a canopy, very sparse but still connected graph due to the overlapping nature of canopy clustering > > Robin > > > > >> On Mon, Aug 16, 2010 at 7:00 AM, Robin Anil <[email protected]> >> wrote: >> >> > From a GSOC angle, it needn't be done, its upto your mentor to decide. I >> am >> > interested more in getting this completed and pushed out so that people >> can >> > really use it. If you can spare time after GSOC and still hang around >> the >> > community and help in getting this polished, it will be great. >> > >> > To create your pairwise similarity(0-1 1 means dissimilar) matrix(it >> can >> > be >> > the other way around as well), see the DistanceMeasure implementations. >> > Creating the pairwise matrix is non trivial from a scalability stand >> point. >> > >> > A complete spectral clustering package should take an input set of >> > documents, create the matrix and run clustering and output the clusters. >> To >> > get an idea of your work till now, what are the blocks missing from this >> > ideal package scenario? >> > >> > >> > Robin >> > >> > >
