> > > Anything but sparse connectivity is a complete non-starter for a scalable > system.
Right, thats why I don't like the pairwise computation approach. In this case, the document needs to be paired with the nearest cluster right, something like Canopy clustering should give partial connection graph ? Robin > On Mon, Aug 16, 2010 at 7:00 AM, Robin Anil <[email protected]> wrote: > > > From a GSOC angle, it needn't be done, its upto your mentor to decide. I > am > > interested more in getting this completed and pushed out so that people > can > > really use it. If you can spare time after GSOC and still hang around the > > community and help in getting this polished, it will be great. > > > > To create your pairwise similarity(0-1 1 means dissimilar) matrix(it can > > be > > the other way around as well), see the DistanceMeasure implementations. > > Creating the pairwise matrix is non trivial from a scalability stand > point. > > > > A complete spectral clustering package should take an input set of > > documents, create the matrix and run clustering and output the clusters. > To > > get an idea of your work till now, what are the blocks missing from this > > ideal package scenario? > > > > > > Robin > > >
